Advanced Statistics are making a major impact in the National Hockey League. This is a professional sport where the amount of money a team can spend on average (Salary Cap Hit) is strictly limited. Therefore, decisions on how much, and how long, to pay players are crucial to a team’s success. Typically, these contract negotians include some type of ‘comparables’ where the performance of other similar players is used to set a market value. The goal for this assignment is as follows:
Determine the metrics which best represent player performance as it relates to salary cap.
Create a prediction function to estimate a player’s value based on the aforementioned metrics.
Use the estimated player values to attempt a prediction on team success.
suppressWarnings(suppressMessages(library(XML)))
suppressWarnings(suppressMessages(library(httr)))
suppressWarnings(suppressMessages(library(rvest)))
suppressWarnings(suppressMessages(library(stringr)))
suppressWarnings(suppressMessages(library(dplyr)))
suppressWarnings(suppressMessages(library(tidyr)))
suppressWarnings(suppressMessages(library(data.table)))
suppressWarnings(suppressMessages(library(ggplot2)))
suppressWarnings(suppressMessages(library(ggthemes)))
suppressWarnings(suppressMessages(library(knitr)))
suppressWarnings(suppressMessages(library(kableExtra)))
This function is needed to load all the tables of multi-page html files.
fill_df <- function(pages, data_df, url) {
for(i in 2:pages){
tmp_url <- paste0(url,i)
players_html <- read_html(tmp_url)
tmp_df <- as.data.frame(players_html %>% html_nodes(xpath = "//*/table[@id='brwt']") %>% html_table())
data_df <- rbind(data_df, tmp_df)
}
return(data_df)
}
This function formats values to millions.
formattermillion <- function(x){
x/1000000
}
#Even Strength
nst_on_ice_doc_2015 <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20142015&stype=2&sit=5v5&score=all&stdoi=oi&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_individual_doc_2015 <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20142015&stype=2&sit=5v5&score=all&stdoi=std&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
players_html_2015 <- read_html("https://www.capfriendly.com/browse/active/2015/caphit/all/all/all/desc/1")
nst_team_doc_2015 <- read_html("https://www.naturalstattrick.com/teamtable.php?season=20142015&stype=2&sit=5v5&score=all&rate=n&vs=all&loc=B&gpf=82&fd=2014-10-08&td=2015-04-11")
nst_on_ice_doc_2016 <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20152016&stype=2&sit=5v5&score=all&stdoi=oi&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_individual_doc_2016 <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20152016&stype=2&sit=5v5&score=all&stdoi=std&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
players_html_2016 <- read_html("https://www.capfriendly.com/browse/active/2016/caphit/all/all/all/desc/1")
nst_team_doc_2016 <- read_html("https://www.naturalstattrick.com/teamtable.php?season=20152016&stype=2&sit=5v5&score=all&rate=n&vs=all&loc=B&gpf=82&fd=2015-10-07&td=2016-04-10")
nst_on_ice_doc_2017 <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20162017&stype=2&sit=5v5&score=all&stdoi=oi&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_individual_doc_2017 <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20162017&stype=2&sit=5v5&score=all&stdoi=std&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
players_html_2017 <- read_html("https://www.capfriendly.com/browse/active/2017/caphit/all/all/all/desc/1")
nst_team_doc_2017 <- read_html("https://www.naturalstattrick.com/teamtable.php?season=20162017&stype=2&sit=5v5&score=all&rate=n&vs=all&loc=B&gpf=82&fd=2016-10-12&td=2017-04-09")
nst_on_ice_doc_2018 <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20172018&stype=2&sit=5v5&score=all&stdoi=oi&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_individual_doc_2018 <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20172018&stype=2&sit=5v5&score=all&stdoi=std&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
players_html_2018 <- read_html("https://www.capfriendly.com/browse/active/2018/caphit/all/all/all/desc/1")
nst_team_doc_2018 <- read_html("https://www.naturalstattrick.com/teamtable.php?season=20172018&stype=2&sit=5v5&score=all&rate=n&vs=all&loc=B&gpf=82&fd=2016-10-12&td=2017-04-09")
#PowerPlay
nst_on_ice_doc_2015_pp <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20142015&stype=2&sit=pp&score=all&stdoi=oi&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_individual_doc_2015_pp <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20142015&stype=2&sit=pp&score=all&stdoi=std&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_on_ice_doc_2016_pp <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20152016&stype=2&sit=pp&score=all&stdoi=oi&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_individual_doc_2016_pp <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20152016&stype=2&sit=pp&score=all&stdoi=std&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_on_ice_doc_2017_pp <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20162017&stype=2&sit=pp&score=all&stdoi=oi&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_individual_doc_2017_pp <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20162017&stype=2&sit=pp&score=all&stdoi=std&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_on_ice_doc_2018_pp <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20172018&stype=2&sit=pp&score=all&stdoi=oi&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_individual_doc_2018_pp <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20172018&stype=2&sit=pp&score=all&stdoi=std&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
#PenaltyKill
nst_on_ice_doc_2015_pk <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20142015&stype=2&sit=pk&score=all&stdoi=oi&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_individual_doc_2015_pk <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20142015&stype=2&sit=pk&score=all&stdoi=std&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_on_ice_doc_2016_pk <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20152016&stype=2&sit=pk&score=all&stdoi=oi&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_individual_doc_2016_pk <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20152016&stype=2&sit=pk&score=all&stdoi=std&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_on_ice_doc_2017_pk <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20162017&stype=2&sit=pk&score=all&stdoi=oi&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_individual_doc_2017_pk <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20162017&stype=2&sit=pk&score=all&stdoi=std&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_on_ice_doc_2018_pk <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20172018&stype=2&sit=pk&score=all&stdoi=oi&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
nst_individual_doc_2018_pk <- read_html("https://www.naturalstattrick.com/playerteams.php?season=20172018&stype=2&sit=pk&score=all&stdoi=std&rate=n&team=ALL&pos=S&loc=B&toi=0&gpfilt=none&fd=&td=&tgp=82&lines=multi")
team_cap_html <- read_html("https://www.capfriendly.com/")
team_cap_df <- as.data.frame(team_cap_html %>% html_nodes(xpath = "//*/table[@id='ich']") %>% html_table())
#Even Strength
nst_on_ice_players_2015 <- as.data.frame(nst_on_ice_doc_2015 %>% html_nodes(xpath = "//*/table[@id='players']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE)) %>% mutate(year = 2015)
nst_on_ice_players_2016 <- as.data.frame(nst_on_ice_doc_2016 %>% html_nodes(xpath = "//*/table[@id='players']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE)) %>% mutate(year = 2016)
nst_on_ice_players_2017 <- as.data.frame(nst_on_ice_doc_2017 %>% html_nodes(xpath = "//*/table[@id='players']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE)) %>% mutate(year = 2017)
nst_on_ice_players_2018 <- as.data.frame(nst_on_ice_doc_2018 %>% html_nodes(xpath = "//*/table[@id='players']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE)) %>% mutate(year = 2018)
#Power Play
nst_on_ice_players_2015_pp <- as.data.frame(nst_on_ice_doc_2015_pp %>% html_nodes(xpath = "//*/table[@id='players']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE)) %>% mutate(year = 2015)
colnames(nst_on_ice_players_2015_pp) <- paste(colnames(nst_on_ice_players_2015_pp), "pp", sep = "_")
nst_on_ice_players_2015_pp <- nst_on_ice_players_2015_pp %>% rename(Player = Player_pp, Team = Team_pp, Position = Position_pp, TOI = TOI_pp, GP = GP_pp, Var.1 = Var.1_pp)
nst_on_ice_players_2016_pp <- as.data.frame(nst_on_ice_doc_2016_pp %>% html_nodes(xpath = "//*/table[@id='players']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE)) %>% mutate(year = 2016)
colnames(nst_on_ice_players_2016_pp) <- paste(colnames(nst_on_ice_players_2016_pp), "pp", sep = "_")
nst_on_ice_players_2016_pp <- nst_on_ice_players_2016_pp %>% rename(Player = Player_pp, Team = Team_pp, Position = Position_pp, TOI = TOI_pp, GP = GP_pp, Var.1 = Var.1_pp)
nst_on_ice_players_2017_pp <- as.data.frame(nst_on_ice_doc_2017_pp %>% html_nodes(xpath = "//*/table[@id='players']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE)) %>% mutate(year = 2017)
colnames(nst_on_ice_players_2017_pp) <- paste(colnames(nst_on_ice_players_2017_pp), "pp", sep = "_")
nst_on_ice_players_2017_pp <- nst_on_ice_players_2017_pp %>% rename(Player = Player_pp, Team = Team_pp, Position = Position_pp, TOI = TOI_pp, GP = GP_pp, Var.1 = Var.1_pp)
nst_on_ice_players_2018_pp <- as.data.frame(nst_on_ice_doc_2018_pp %>% html_nodes(xpath = "//*/table[@id='players']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE)) %>% mutate(year = 2018)
colnames(nst_on_ice_players_2018_pp) <- paste(colnames(nst_on_ice_players_2018_pp), "pp", sep = "_")
nst_on_ice_players_2018_pp <- nst_on_ice_players_2018_pp %>% rename(Player = Player_pp, Team = Team_pp, Position = Position_pp, TOI = TOI_pp, GP = GP_pp, Var.1 = Var.1_pp)
#Penalty Kill
nst_on_ice_players_2015_pk <- as.data.frame(nst_on_ice_doc_2015_pk %>% html_nodes(xpath = "//*/table[@id='players']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE)) %>% mutate(year = 2015)
colnames(nst_on_ice_players_2015_pk) <- paste(colnames(nst_on_ice_players_2015_pk), "pk", sep = "_")
nst_on_ice_players_2015_pk <- nst_on_ice_players_2015_pk %>% rename(Player = Player_pk, Team = Team_pk, Position = Position_pk, TOI = TOI_pk, GP = GP_pk, Var.1 = Var.1_pk)
nst_on_ice_players_2016_pk <- as.data.frame(nst_on_ice_doc_2016_pk %>% html_nodes(xpath = "//*/table[@id='players']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE)) %>% mutate(year = 2016)
colnames(nst_on_ice_players_2016_pk) <- paste(colnames(nst_on_ice_players_2016_pk), "pk", sep = "_")
nst_on_ice_players_2016_pk <- nst_on_ice_players_2016_pk %>% rename(Player = Player_pk, Team = Team_pk, Position = Position_pk, TOI = TOI_pk, GP = GP_pk, Var.1 = Var.1_pk)
nst_on_ice_players_2017_pk <- as.data.frame(nst_on_ice_doc_2017_pk %>% html_nodes(xpath = "//*/table[@id='players']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE)) %>% mutate(year = 2017)
colnames(nst_on_ice_players_2017_pk) <- paste(colnames(nst_on_ice_players_2017_pk), "pk", sep = "_")
nst_on_ice_players_2017_pk <- nst_on_ice_players_2017_pk %>% rename(Player = Player_pk, Team = Team_pk, Position = Position_pk, TOI = TOI_pk, GP = GP_pk, Var.1 = Var.1_pk)
nst_on_ice_players_2018_pk <- as.data.frame(nst_on_ice_doc_2018_pk %>% html_nodes(xpath = "//*/table[@id='players']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE)) %>% mutate(year = 2018)
colnames(nst_on_ice_players_2018_pk) <- paste(colnames(nst_on_ice_players_2018_pk), "pk", sep = "_")
nst_on_ice_players_2018_pk <- nst_on_ice_players_2018_pk %>% rename(Player = Player_pk, Team = Team_pk, Position = Position_pk, TOI = TOI_pk, GP = GP_pk, Var.1 = Var.1_pk)
#Even Strength
nst_individual_players_2015 <- as.data.frame(nst_individual_doc_2015 %>% html_nodes(xpath = "//*/table[@id='indreg']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE))
nst_individual_players_2016 <- as.data.frame(nst_individual_doc_2016 %>% html_nodes(xpath = "//*/table[@id='indreg']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE))
nst_individual_players_2017 <- as.data.frame(nst_individual_doc_2017 %>% html_nodes(xpath = "//*/table[@id='indreg']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE))
nst_individual_players_2018 <- as.data.frame(nst_individual_doc_2018 %>% html_nodes(xpath = "//*/table[@id='indreg']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE))
#Power Play
nst_individual_players_2015_pp <- as.data.frame(nst_individual_doc_2015_pp %>% html_nodes(xpath = "//*/table[@id='indreg']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE))
colnames(nst_individual_players_2015_pp) <- paste(colnames(nst_individual_players_2015_pp), "pp", sep = "_")
nst_individual_players_2015_pp <- nst_individual_players_2015_pp %>% rename(Player = Player_pp, Team = Team_pp, Position = Position_pp, TOI = TOI_pp, GP = GP_pp, Var.1 = Var.1_pp)
nst_individual_players_2016_pp <- as.data.frame(nst_individual_doc_2016_pp %>% html_nodes(xpath = "//*/table[@id='indreg']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE))
colnames(nst_individual_players_2016_pp) <- paste(colnames(nst_individual_players_2016_pp), "pp", sep = "_")
nst_individual_players_2016_pp <- nst_individual_players_2016_pp %>% rename(Player = Player_pp, Team = Team_pp, Position = Position_pp, TOI = TOI_pp, GP = GP_pp, Var.1 = Var.1_pp)
nst_individual_players_2017_pp <- as.data.frame(nst_individual_doc_2017_pp %>% html_nodes(xpath = "//*/table[@id='indreg']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE))
colnames(nst_individual_players_2017_pp) <- paste(colnames(nst_individual_players_2017_pp), "pp", sep = "_")
nst_individual_players_2017_pp <- nst_individual_players_2017_pp %>% rename(Player = Player_pp, Team = Team_pp, Position = Position_pp, TOI = TOI_pp, GP = GP_pp, Var.1 = Var.1_pp)
nst_individual_players_2018_pp <- as.data.frame(nst_individual_doc_2018_pp %>% html_nodes(xpath = "//*/table[@id='indreg']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE))
colnames(nst_individual_players_2018_pp) <- paste(colnames(nst_individual_players_2018_pp), "pp", sep = "_")
nst_individual_players_2018_pp <- nst_individual_players_2018_pp %>% rename(Player = Player_pp, Team = Team_pp, Position = Position_pp, TOI = TOI_pp, GP = GP_pp, Var.1 = Var.1_pp)
#Penalty Kill
nst_individual_players_2015_pk <- as.data.frame(nst_individual_doc_2015_pk %>% html_nodes(xpath = "//*/table[@id='indreg']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE))
colnames(nst_individual_players_2015_pk) <- paste(colnames(nst_individual_players_2015_pk), "pk", sep = "_")
nst_individual_players_2015_pk <- nst_individual_players_2015_pk %>% rename(Player = Player_pk, Team = Team_pk, Position = Position_pk, TOI = TOI_pk, GP = GP_pk, Var.1 = Var.1_pk)
nst_individual_players_2016_pk <- as.data.frame(nst_individual_doc_2016_pk %>% html_nodes(xpath = "//*/table[@id='indreg']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE))
colnames(nst_individual_players_2016_pk) <- paste(colnames(nst_individual_players_2016_pk), "pk", sep = "_")
nst_individual_players_2016_pk <- nst_individual_players_2016_pk %>% rename(Player = Player_pk, Team = Team_pk, Position = Position_pk, TOI = TOI_pk, GP = GP_pk, Var.1 = Var.1_pk)
nst_individual_players_2017_pk <- as.data.frame(nst_individual_doc_2017_pk %>% html_nodes(xpath = "//*/table[@id='indreg']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE))
colnames(nst_individual_players_2017_pk) <- paste(colnames(nst_individual_players_2017_pk), "pk", sep = "_")
nst_individual_players_2017_pk <- nst_individual_players_2017_pk %>% rename(Player = Player_pk, Team = Team_pk, Position = Position_pk, TOI = TOI_pk, GP = GP_pk, Var.1 = Var.1_pk)
nst_individual_players_2018_pk <- as.data.frame(nst_individual_doc_2018_pk %>% html_nodes(xpath = "//*/table[@id='indreg']") %>% html_table()) %>% mutate(Player = gsub(pattern = "\u00A0", replacement = '_', x = Player, fixed = TRUE))
colnames(nst_individual_players_2018_pk) <- paste(colnames(nst_individual_players_2018_pk), "pk", sep = "_")
nst_individual_players_2018_pk <- nst_individual_players_2018_pk %>% rename(Player = Player_pk, Team = Team_pk, Position = Position_pk, TOI = TOI_pk, GP = GP_pk, Var.1 = Var.1_pk)
last_page_2015 <- as.numeric(str_match_all(players_html_2015 %>% html_nodes(xpath = "//*/div[@class='r']") %>% html_node('span') %>% html_text(), "of\\s+(\\d+)")[[1]][2])
players_cap_raw_df_2015 <- as.data.frame(players_html_2015 %>% html_nodes(xpath = "//*/table[@id='brwt']") %>% html_table())
players_cap_raw_df_2015 <- fill_df(last_page_2015, players_cap_raw_df_2015, "https://www.capfriendly.com/browse/active/2015/caphit/all/all/all/desc/")
last_page_2016 <- as.numeric(str_match_all(players_html_2016 %>% html_nodes(xpath = "//*/div[@class='r']") %>% html_node('span') %>% html_text(), "of\\s+(\\d+)")[[1]][2])
players_cap_raw_df_2016 <- as.data.frame(players_html_2016 %>% html_nodes(xpath = "//*/table[@id='brwt']") %>% html_table())
players_cap_raw_df_2016 <- fill_df(last_page_2016, players_cap_raw_df_2016, "https://www.capfriendly.com/browse/active/2017/caphit/all/all/all/desc/")
last_page_2017 <- as.numeric(str_match_all(players_html_2017 %>% html_nodes(xpath = "//*/div[@class='r']") %>% html_node('span') %>% html_text(), "of\\s+(\\d+)")[[1]][2])
players_cap_raw_df_2017 <- as.data.frame(players_html_2017 %>% html_nodes(xpath = "//*/table[@id='brwt']") %>% html_table())
players_cap_raw_df_2017 <- fill_df(last_page_2017, players_cap_raw_df_2017, "https://www.capfriendly.com/browse/active/2017/caphit/all/all/all/desc/")
last_page_2018 <- as.numeric(str_match_all(players_html_2018 %>% html_nodes(xpath = "//*/div[@class='r']") %>% html_node('span') %>% html_text(), "of\\s+(\\d+)")[[1]][2])
players_cap_raw_df_2018 <- as.data.frame(players_html_2018 %>% html_nodes(xpath = "//*/table[@id='brwt']") %>% html_table())
players_cap_raw_df_2018 <- fill_df(last_page_2018, players_cap_raw_df_2018, "https://www.capfriendly.com/browse/active/2018/caphit/all/all/all/desc/")
nst_team_2015_df <- as.data.frame(nst_team_doc_2015 %>% html_nodes(xpath = "//*/table[@id='teams']") %>% html_table())
nst_team_2015_df$year <- 2015
nst_team_2016_df <- as.data.frame(nst_team_doc_2016 %>% html_nodes(xpath = "//*/table[@id='teams']") %>% html_table())
nst_team_2016_df$year <- 2016
nst_team_2017_df <- as.data.frame(nst_team_doc_2017 %>% html_nodes(xpath = "//*/table[@id='teams']") %>% html_table())
nst_team_2017_df$year <- 2017
nst_team_2018_df <- as.data.frame(nst_team_doc_2018 %>% html_nodes(xpath = "//*/table[@id='teams']") %>% html_table())
nst_team_2018_df$year <- 2018
#Be kind, rewind.... and cache web scrapes.
load(url("https://github.com/john-grando/data607-finalproject/raw/master/html_files.RData"))
#remove duplicate columns
drop <- c("","Position","TOI", "GP", "Var.1")
nst_raw_players_2015 <- left_join(nst_on_ice_players_2015[, !names(nst_on_ice_players_2015) %in% c("Var.1")], nst_individual_players_2015[, !names(nst_individual_players_2015) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_on_ice_players_2015_pp[, !names(nst_on_ice_players_2015_pp) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_individual_players_2015_pp[, !names(nst_individual_players_2015_pp) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_on_ice_players_2015_pk[, !names(nst_on_ice_players_2015_pk) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_individual_players_2015_pk[, !names(nst_individual_players_2015_pk) %in% drop], by = c("Player" = "Player","Team" = "Team"))
nst_raw_players_2016 <- left_join(nst_on_ice_players_2016[, !names(nst_on_ice_players_2016) %in% c("Var.1")], nst_individual_players_2016[, !names(nst_individual_players_2016) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_on_ice_players_2016_pp[, !names(nst_on_ice_players_2016_pp) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_individual_players_2016_pp[, !names(nst_individual_players_2016_pp) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_on_ice_players_2016_pk[, !names(nst_on_ice_players_2016_pk) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_individual_players_2016_pk[, !names(nst_individual_players_2016_pk) %in% drop], by = c("Player" = "Player","Team" = "Team"))
nst_raw_players_2017 <- left_join(nst_on_ice_players_2017[, !names(nst_on_ice_players_2017) %in% c("Var.1")], nst_individual_players_2017[, !names(nst_individual_players_2017) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_on_ice_players_2017_pp[, !names(nst_on_ice_players_2017_pp) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_individual_players_2017_pp[, !names(nst_individual_players_2017_pp) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_on_ice_players_2017_pk[, !names(nst_on_ice_players_2017_pk) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_individual_players_2017_pk[, !names(nst_individual_players_2017_pk) %in% drop], by = c("Player" = "Player","Team" = "Team"))
nst_raw_players_2018 <- left_join(nst_on_ice_players_2018[, !names(nst_on_ice_players_2018) %in% c("Var.1")], nst_individual_players_2018[, !names(nst_individual_players_2018) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_on_ice_players_2018_pp[, !names(nst_on_ice_players_2018_pp) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_individual_players_2018_pp[, !names(nst_individual_players_2018_pp) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_on_ice_players_2018_pk[, !names(nst_on_ice_players_2018_pk) %in% drop], by = c("Player" = "Player","Team" = "Team")) %>% left_join(nst_individual_players_2018_pk[, !names(nst_individual_players_2018_pk) %in% drop], by = c("Player" = "Player","Team" = "Team"))
nst_players_2015 <- nst_raw_players_2015 %>% mutate(Player = as.character(Player)) %>% mutate(Player = gsub("[^[:alpha:]]","_",Player))
keep <- c("PLAYER","EXPIRY","CAP.HIT..")
nst_players_2016 <- nst_raw_players_2016 %>% mutate(Player = as.character(Player)) %>% mutate(Player = gsub("[^[:alpha:]]","_",Player))
keep <- c("PLAYER","EXPIRY","CAP.HIT..")
nst_players_2017 <- nst_raw_players_2017 %>% mutate(Player = as.character(Player)) %>% mutate(Player = gsub("[^[:alpha:]]","_",Player))
nst_players_2018 <- nst_raw_players_2018 %>% mutate(Player = as.character(Player)) %>% mutate(Player = gsub("[^[:alpha:]]","_",Player))
keep <- c("PLAYER","EXPIRY","CAP.HIT..")
players_cap_df_2015 <- players_cap_raw_df_2015[names(players_cap_raw_df_2015) %in% keep] %>% mutate(PLAYER = as.character(trimws(sapply(PLAYER, function(x){str_match_all(x, "\\.(.*)")[[1]][2]})))) %>% mutate(PLAYER = gsub("[^[:alpha:]]","_",PLAYER)) %>% mutate(CAP.HIT.. = as.numeric(gsub("[^0-9]","",CAP.HIT..)))
players_cap_df_2016 <- players_cap_raw_df_2016[names(players_cap_raw_df_2016) %in% keep] %>% mutate(PLAYER = as.character(trimws(sapply(PLAYER, function(x){str_match_all(x, "\\.(.*)")[[1]][2]})))) %>% mutate(PLAYER = gsub("[^[:alpha:]]","_",PLAYER)) %>% mutate(CAP.HIT.. = as.numeric(gsub("[^0-9]","",CAP.HIT..)))
players_cap_df_2017 <- players_cap_raw_df_2017[names(players_cap_raw_df_2017) %in% keep] %>% mutate(PLAYER = as.character(trimws(sapply(PLAYER, function(x){str_match_all(x, "\\.(.*)")[[1]][2]})))) %>% mutate(PLAYER = gsub("[^[:alpha:]]","_",PLAYER)) %>% mutate(CAP.HIT.. = as.numeric(gsub("[^0-9]","",CAP.HIT..)))
players_cap_df_2018 <- players_cap_raw_df_2018[names(players_cap_raw_df_2018) %in% keep] %>% mutate(PLAYER = as.character(trimws(sapply(PLAYER, function(x){str_match_all(x, "\\.(.*)")[[1]][2]})))) %>% mutate(PLAYER = gsub("[^[:alpha:]]","_",PLAYER)) %>% mutate(CAP.HIT.. = as.numeric(gsub("[^0-9]","",CAP.HIT..)))
setnames(players_cap_df_2015, old=c("PLAYER"), new=c("Player"))
setnames(players_cap_df_2016, old=c("PLAYER"), new=c("Player"))
setnames(players_cap_df_2017, old=c("PLAYER"), new=c("Player"))
setnames(players_cap_df_2018, old=c("PLAYER"), new=c("Player"))
full_player_df_2015 <- unique(inner_join(nst_players_2015, players_cap_df_2015, by = "Player"))
full_player_df_2016 <- unique(inner_join(nst_players_2016, players_cap_df_2016, by = "Player"))
full_player_df_2017 <- unique(inner_join(nst_players_2017, players_cap_df_2017, by = "Player"))
full_player_df_2018 <- unique(inner_join(nst_players_2018, players_cap_df_2018, by = "Player"))
full_player_df <- rbind(full_player_df_2015, full_player_df_2016, full_player_df_2017, full_player_df_2018)
player_per_game_df <- full_player_df[,c("CF","CA","FF","FA","SF","SA","GF","GA","SCF","SCA","HDCF","HDCA","HDGF","HDGA","MDCF","MDCA", "MDGF","MDGA","LDCF","LDCA","LDGF","LDGA","Off..Zone.Faceoffs","Neu..Zone.Faceoffs","Def..Zone.Faceoffs","Goals","Total.Assists", "First.Assists","Second.Assists","Total.Points","Shots","iCF","iSCF","iHDCF", "Rebounds.Created", "PIM", "Total.Penalties", "Minor", "Major", "Misconduct", "Penalties.Drawn", "Giveaways", "Takeaways", "Hits", "Hits.Taken", "Shots.Blocked", "CF_pp", "CA_pp", "FF_pp", "FA_pp", "SF_pp", "SA_pp", "GF_pp", "GA_pp", "SCF_pp", "SCA_pp", "HDCF_pp", "HDCA_pp", "HDGF_pp", "HDGA_pp", "MDCF_pp", "MDCA_pp", "MDGF_pp", "MDGA_pp", "LDCF_pp", "LDCA_pp", "LDGF_pp", "LDGA_pp", "Off..Zone.Faceoffs_pp","Neu..Zone.Faceoffs_pp","Def..Zone.Faceoffs_pp","Goals_pp","Total.Assists_pp", "First.Assists_pp", "Second.Assists_pp", "Total.Points_pp", "Shots_pp", "iCF_pp","iSCF_pp", "iHDCF_pp", "Rebounds.Created_pp", "PIM_pp", "Total.Penalties_pp", "Minor_pp", "Major_pp", "Misconduct_pp", "Penalties.Drawn_pp", "Giveaways_pp", "Takeaways_pp", "Hits_pp", "Hits.Taken_pp", "Shots.Blocked_pp", "CF_pk", "CA_pk", "FF_pk", "FA_pk", "SF_pk", "SA_pk", "GF_pk", "GA_pk", "SCF_pk", "SCA_pk", "HDCF_pk", "HDCA_pk", "HDGF_pk", "HDGA_pk", "MDCF_pk", "MDCA_pk", "MDGF_pk", "MDGA_pk", "LDCF_pk","LDCA_pk", "LDGF_pk","LDGA_pk", "Off..Zone.Faceoffs_pk", "Neu..Zone.Faceoffs_pk", "Def..Zone.Faceoffs_pk","Goals_pk","Total.Assists_pk", "First.Assists_pk", "Second.Assists_pk", "Total.Points_pk", "Shots_pk", "iCF_pk","iSCF_pk", "iHDCF_pk", "Rebounds.Created_pk", "PIM_pk", "Total.Penalties_pk", "Minor_pk", "Major_pk", "Misconduct_pk", "Penalties.Drawn_pk", "Giveaways_pk", "Takeaways_pk", "Hits_pk", "Hits.Taken_pk", "Shots.Blocked_pk")] / full_player_df[,5]
player_per_game_df <- cbind(full_player_df[,1:4],player_per_game_df, full_player_df[,c("TOI","EXPIRY", "CAP.HIT..", "year")])
player_per_game_df$Position <- ifelse(player_per_game_df$Position=="D","D","F")
player_per_game_df$Position <- factor(player_per_game_df$Position, levels = c("F", "D"))
Here, it is noted that not all teams spend to their maximum cap so it is important to analyze the individual metrics for players.
team_cap_df <- team_cap_df %>% mutate(CAP.HIT = 75000000 - as.numeric(gsub('[$,]', '', PROJECTED.CAP.SPACE)))
ggplot(team_cap_df, aes(x=reorder(TEAM, -CAP.HIT), y=CAP.HIT)) + geom_col(fill = "steelblue") + theme_economist() + labs(x = "Team", y = "Cap Hit (Millions)", title = "NHL Teams") + theme(plot.title = element_text(hjust = 0.5), text = element_text(size=12)) + coord_flip() + scale_y_continuous(limits = c(0,80000000), breaks = (seq(0, 80000000, 5000000)), labels = formattermillion)
This is where the data gets a little challenging. There are actual a few different contract types in the NHL. First, there is an Entry Level Contract (ELC) where a player signs for a relatively fixed amount (typically less than one million per year for three years). Next, there is a Restricted Free Agent Contract (RFA) where a player must sign with their team, and if another team were to extend an “offer sheet” to a player to have him switch teams, it would come with additional compensation to the former team (draft picks). Finally, there are Unrestricted Free Agent Contracts (UFA) where players may sign for any amount or term (with some capped restrictions). In order to try and determine a “true” player value, this analysis tried to focus on UFA contracts by basing a filter on the expiration status of a contract (RFA, UFA) and cap hit (more than one million dollars).
ufa_player_df <- player_per_game_df[player_per_game_df$EXPIRY=="UFA" & player_per_game_df$CAP.HIT..>1000000 & player_per_game_df$GP>41 & player_per_game_df$year < 2018,]
#ufa_player_df <- player_per_game_df[!duplicated(player_per_game_df[,c("Player","CAP.HIT..")]),]
f_per_game_df <- ufa_player_df[ufa_player_df$Position!="D",]
d_per_game_df <- ufa_player_df[ufa_player_df$Position=="D",]
all_median = median(player_per_game_df$CAP.HIT..)
ufa_median = median(ufa_player_df$CAP.HIT..)
#ggplot(player_per_game_df, aes(x=CF, fill=Position)) + geom_histogram(bins = 20)
ggplot(player_per_game_df, aes(x=CAP.HIT.., fill=Position)) + geom_histogram(bins=20, position = "identity", alpha = 0.2) + facet_wrap(~EXPIRY) + geom_vline(xintercept = all_median) + geom_text(aes(x=(all_median-500000), y=150, label="All player median"), angle=90, size = 5) + geom_vline(xintercept = ufa_median) + geom_text(aes(x=(ufa_median-500000), y=150, label="UFA player median"), angle=90, size = 5) + theme_economist() + labs(x = "Cap Hit", y = "Count", title = "NHL Players") + theme(plot.title = element_text(hjust = 0.5), text = element_text(size=12), legend.position = "right") + scale_x_continuous(limits = c(0, 11000000), breaks = (seq(0, 11000000, 1000000)), labels = formattermillion) + scale_fill_manual(values=c("mediumblue", "springgreen"))
ggplot(ufa_player_df, aes(x=CAP.HIT.., fill=Position)) + geom_histogram(bins=20, position = "identity", alpha = 0.2) + geom_vline(xintercept = ufa_median) + geom_text(aes(x=(ufa_median-300000), y=30, label="UFA player median"), angle=90, size = 5) + theme_economist() + labs(x = "Cap Hit", y = "Count", title = "UFA Players") + theme(plot.title = element_text(hjust = 0.5), text = element_text(size=12), legend.position = "right") + scale_x_continuous(limits = c(0, 11000000), breaks = (seq(0, 11000000, 1000000)), labels = formattermillion) + scale_fill_manual(values=c("mediumblue", "springgreen"))
Note, since the roles of forwards and defensemen are so different, they have been separately analyzed in this study.
f_complete_player_df <- f_per_game_df[complete.cases(f_per_game_df), ]
d_complete_player_df <- d_per_game_df[complete.cases(d_per_game_df), ]
sample_size <- floor(0.8 * nrow(f_per_game_df))
set.seed(8345)
f_train_dev_ind <- sample(seq_len(nrow(f_complete_player_df)), size = sample_size)
f_train_df <-f_complete_player_df[f_train_dev_ind,]
f_dev_df <- f_complete_player_df[-f_train_dev_ind,]
sample_size <- floor(0.8 * nrow(d_per_game_df))
set.seed(3454)
d_train_dev_ind <- sample(seq_len(nrow(d_complete_player_df)), size = sample_size)
d_train_df <-d_complete_player_df[d_train_dev_ind,]
d_dev_df <- d_complete_player_df[-d_train_dev_ind,]
f_glm <- glm(data = f_train_df, CAP.HIT.. ~ CF + CA + SCF + SCA + HDCF + HDCA + MDCF + MDCA + LDCF + LDCA + Goals + Total.Assists + First.Assists + Second.Assists + Total.Points + Shots + iCF + iSCF + iHDCF + Rebounds.Created + PIM + Total.Penalties + Minor + Major + Misconduct + Penalties.Drawn + Giveaways + Takeaways + Hits + Hits.Taken + Shots.Blocked + CF_pp + CA_pp + SCF_pp + SCA_pp + HDCF_pp + HDCA_pp + MDCF_pp + MDCA_pp + LDCF_pp + LDCA_pp + Goals_pp + Total.Assists_pp + First.Assists_pp + Second.Assists_pp + Total.Points_pp + Shots_pp + iCF_pp + iSCF_pp + iHDCF_pp + Rebounds.Created_pp + PIM_pp + Total.Penalties_pp + Minor_pp + Major_pp + Misconduct_pp + Penalties.Drawn_pp + Giveaways_pp + Takeaways_pp + Hits_pp + Hits.Taken_pp + Shots.Blocked_pp + CF_pk + CA_pk + SCF_pk + SCA_pk + HDCF_pk + HDCA_pk + MDCF_pk + MDCA_pk + LDCF_pk + LDCA_pk + Goals_pk + Total.Assists_pk + First.Assists_pk + Second.Assists_pk + Total.Points_pk + Shots_pk + iCF_pk + iSCF_pk + iHDCF_pk + Rebounds.Created_pk + PIM_pk + Total.Penalties_pk + Minor_pk + Major_pk + Misconduct_pk + Penalties.Drawn_pk + Giveaways_pk + Takeaways_pk + Hits_pk + Hits.Taken_pk + Shots.Blocked_pk, family = "Gamma" (link = 'log'))
step_f_glm <- step(f_glm, trace = FALSE)
summary(step_f_glm)
##
## Call:
## glm(formula = CAP.HIT.. ~ CF + Goals + Total.Assists + Shots +
## Total.Penalties + Minor + Takeaways + Hits.Taken + CF_pp +
## CA_pp + SCA_pp + HDCF_pp + LDCA_pp + Goals_pp + Total.Assists_pp +
## Shots_pp + iSCF_pp + iHDCF_pp + SCA_pk + HDCF_pk + HDCA_pk +
## Total.Assists_pk + iCF_pk + iSCF_pk + Rebounds.Created_pk +
## Penalties.Drawn_pk, family = Gamma(link = "log"), data = f_train_df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2470 -0.2638 -0.0255 0.2118 0.8513
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.2087 0.1867 76.103 < 2e-16 ***
## CF 0.4760 0.2324 2.048 0.041183 *
## Goals -10.0489 4.3190 -2.327 0.020449 *
## Total.Assists 9.5873 3.3909 2.827 0.004914 **
## Shots 2.2237 0.7988 2.784 0.005612 **
## Total.Penalties -63.3366 13.0787 -4.843 1.79e-06 ***
## Minor 64.4153 13.5545 4.752 2.75e-06 ***
## Takeaways 2.6666 1.7598 1.515 0.130449
## Hits.Taken -2.3178 0.5818 -3.983 7.98e-05 ***
## CF_pp 1.8019 0.4640 3.883 0.000119 ***
## CA_pp -10.1331 5.9970 -1.690 0.091816 .
## SCA_pp 14.7489 7.3924 1.995 0.046662 *
## HDCF_pp -5.5121 1.7483 -3.153 0.001730 **
## LDCA_pp 19.6599 7.6634 2.565 0.010645 *
## Goals_pp -16.2565 8.4995 -1.913 0.056464 .
## Total.Assists_pp 14.7384 5.9731 2.467 0.013998 *
## Shots_pp 9.8613 2.8874 3.415 0.000698 ***
## iSCF_pp -9.9211 3.2312 -3.070 0.002274 **
## iHDCF_pp 9.0660 3.0494 2.973 0.003115 **
## SCA_pk -2.0989 1.2450 -1.686 0.092538 .
## HDCF_pk -22.8880 7.9500 -2.879 0.004190 **
## HDCA_pk 5.6729 3.1166 1.820 0.069429 .
## Total.Assists_pk 55.7194 26.9421 2.068 0.039231 *
## iCF_pk -12.7809 7.6090 -1.680 0.093746 .
## iSCF_pk 19.0641 11.7231 1.626 0.104646
## Rebounds.Created_pk 96.7542 29.8999 3.236 0.001307 **
## Penalties.Drawn_pk 15.0288 9.1585 1.641 0.101540
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Gamma family taken to be 0.1137775)
##
## Null deviance: 119.038 on 453 degrees of freedom
## Residual deviance: 52.455 on 427 degrees of freedom
## AIC: 14106
##
## Number of Fisher Scoring iterations: 5
d_glm <- glm(data = d_train_df, CAP.HIT.. ~ CF + CA + SCF + SCA + HDCF + HDCA + MDCF + MDCA + LDCF + LDCA + Goals + Total.Assists + First.Assists + Second.Assists + Total.Points + Shots + iCF + iSCF + iHDCF + Rebounds.Created + PIM + Total.Penalties + Minor + Major + Misconduct + Penalties.Drawn + Giveaways + Takeaways + Hits + Hits.Taken + Shots.Blocked + CF_pp + CA_pp + SCF_pp + SCA_pp + HDCF_pp + HDCA_pp + MDCF_pp + MDCA_pp + LDCF_pp + LDCA_pp + Goals_pp + Total.Assists_pp + First.Assists_pp + Second.Assists_pp + Total.Points_pp + Shots_pp + iCF_pp + iSCF_pp + iHDCF_pp + Rebounds.Created_pp + PIM_pp + Total.Penalties_pp + Minor_pp + Major_pp + Misconduct_pp + Penalties.Drawn_pp + Giveaways_pp + Takeaways_pp + Hits_pp + Hits.Taken_pp + Shots.Blocked_pp + CF_pk + CA_pk + SCF_pk + SCA_pk + HDCF_pk + HDCA_pk + MDCF_pk + MDCA_pk + LDCF_pk + LDCA_pk + Goals_pk + Total.Assists_pk + First.Assists_pk + Second.Assists_pk + Total.Points_pk + Shots_pk + iCF_pk + iSCF_pk + iHDCF_pk + Rebounds.Created_pk + PIM_pk + Total.Penalties_pk + Minor_pk + Major_pk + Misconduct_pk + Penalties.Drawn_pk + Giveaways_pk + Takeaways_pk + Hits_pk + Hits.Taken_pk + Shots.Blocked_pk, family = "Gamma" (link = 'log'))
step_d_glm <- step(d_glm, trace = FALSE)
summary(step_d_glm)
##
## Call:
## glm(formula = CAP.HIT.. ~ Goals + Shots + iSCF + Hits.Taken +
## Shots.Blocked + CF_pp + SCF_pp + LDCF_pp + Shots_pp + Rebounds.Created_pp +
## PIM_pp + Total.Penalties_pp + Minor_pp + CA_pk + SCA_pk +
## LDCA_pk + Total.Assists_pk + PIM_pk + Total.Penalties_pk +
## Minor_pk + Giveaways_pk, family = Gamma(link = "log"), data = d_train_df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.00038 -0.22529 0.01786 0.21782 0.72854
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.4079 0.1647 93.533 < 2e-16 ***
## Goals -43.5053 13.6619 -3.184 0.00164 **
## Shots 3.7504 1.8355 2.043 0.04213 *
## iSCF -6.2090 2.7194 -2.283 0.02330 *
## Hits.Taken -2.5047 0.7526 -3.328 0.00101 **
## Shots.Blocked -2.7418 1.1342 -2.417 0.01639 *
## CF_pp 24.4404 10.4749 2.333 0.02047 *
## SCF_pp -24.2866 10.8339 -2.242 0.02590 *
## LDCF_pp -25.2514 10.9237 -2.312 0.02165 *
## Shots_pp 15.1178 5.0835 2.974 0.00324 **
## Rebounds.Created_pp -42.5128 18.6720 -2.277 0.02369 *
## PIM_pp 72.2512 35.0528 2.061 0.04037 *
## Total.Penalties_pp -459.8654 191.7612 -2.398 0.01725 *
## Minor_pp 351.2200 128.4503 2.734 0.00672 **
## CA_pk -22.5169 7.7119 -2.920 0.00384 **
## SCA_pk 24.6426 7.8301 3.147 0.00186 **
## LDCA_pk 22.1086 8.2589 2.677 0.00795 **
## Total.Assists_pk 94.3023 39.9053 2.363 0.01893 *
## PIM_pk 56.7550 25.6723 2.211 0.02801 *
## Total.Penalties_pk -562.6875 200.4699 -2.807 0.00542 **
## Minor_pk 356.7556 152.0071 2.347 0.01975 *
## Giveaways_pk 24.5560 8.9598 2.741 0.00660 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Gamma family taken to be 0.1146516)
##
## Null deviance: 58.450 on 259 degrees of freedom
## Residual deviance: 32.468 on 238 degrees of freedom
## AIC: 8101
##
## Number of Fisher Scoring iterations: 6
f_train_df$EstCapHit <- step_f_glm$fitted.values
f_train_df$EstCapHit_res <- step_f_glm$residuals
d_train_df$EstCapHit <- step_d_glm$fitted.values
d_train_df$EstCapHit_res <- step_d_glm$residuals
ggplot(f_train_df, aes(x=EstCapHit_res)) + geom_histogram(bins = 15) + theme_economist() + labs(x = "Residuals", y = "Count", title = "Forwards Residuals") + theme(plot.title = element_text(hjust = 0.5), text = element_text(size=12), legend.position = "right") + scale_x_continuous(limits = c(-1.6, 1.6), breaks = (seq(-1.6, 1.6, 0.2)))
qqnorm(step_f_glm$residuals)
qqline(step_f_glm$residuals)
ggplot(d_train_df, aes(x=EstCapHit_res)) + geom_histogram(bins = 15) + theme_economist() + labs(x = "Residuals", y = "Count", title = "Defense Residuals") + theme(plot.title = element_text(hjust = 0.5), text = element_text(size=12), legend.position = "right") + scale_x_continuous(limits = c(-1.6, 1.6), breaks = (seq(-1.6, 1.6, 0.2)))
qqnorm(step_d_glm$residuals)
qqline(step_d_glm$residuals)
f_dev_df$EstCapHit <- predict(step_f_glm, f_dev_df[,c("CF", "Goals", "Total.Assists", "Shots", "Total.Penalties", "Minor", "Takeaways", "Hits.Taken", "CF_pp", "CA_pp", "SCA_pp", "HDCF_pp", "LDCA_pp", "Goals_pp", "Total.Assists_pp", "Shots_pp", "iSCF_pp", "iHDCF_pp", "SCA_pk", "HDCF_pk", "HDCA_pk", "Total.Assists_pk", "iCF_pk", "iSCF_pk", "Rebounds.Created_pk", "Penalties.Drawn_pk")], type = "response")
d_dev_df$EstCapHit <- predict(step_d_glm, d_dev_df[,c("Goals", "Shots", "iSCF", "Hits.Taken", "Shots.Blocked", "CF_pp", "SCF_pp", "LDCF_pp", "Shots_pp", "Rebounds.Created_pp", "PIM_pp", "Total.Penalties_pp", "Minor_pp", "CA_pk", "SCA_pk", "LDCA_pk", "Total.Assists_pk", "PIM_pk", "Total.Penalties_pk", "Minor_pk", "Giveaways_pk")], type = "response")
f_dev_summary_df <- f_dev_df %>% mutate(variation = 1 - CAP.HIT.. / EstCapHit) %>% summarize(avg = mean(variation), max = max(variation), min = min(variation), sd = sd(variation))
d_dev_summary_df <- d_dev_df %>% mutate(variation = 1 - CAP.HIT.. / EstCapHit) %>% summarize(avg = mean(variation), max = max(variation), min = min(variation), sd = sd(variation))
#variation
f_dev_summary_df
## avg max min sd
## 1 0.00282645 0.701659 -1.581916 0.4013641
d_dev_summary_df
## avg max min sd
## 1 0.02126109 0.6985328 -1.637093 0.4381454
f_train_df$type = "Training"
f_dev_df$type = "Development"
f_train_dev_df <- rbind(f_train_df[,c("type","CAP.HIT..", "EstCapHit")], f_dev_df[,c("type","CAP.HIT..", "EstCapHit")]) %>% gather(captype, "value", 2:3)
f_train_dev_df <- transform(f_train_dev_df, type=factor(type, levels=c("Training","Development")))
ggplot(f_train_dev_df, aes(x = captype, y=value)) + geom_boxplot() + facet_wrap(~type) + theme_economist() + labs(x = "Group", y = "Cap Hit (Millions)", title = "Actual Cap Hit vs. Training and Development Data") + theme(plot.title = element_text(hjust = 0.5), text = element_text(size=12), legend.position = "right") + scale_y_continuous(limits = c(0, 15000000), breaks = (seq(0, 15000000, 2000000)), labels = formattermillion) + scale_x_discrete(labels = c("Cap Hit", "Estimated"))
player_per_game_df[is.na(player_per_game_df)] <- 0
player_per_game_df$EstCapHit <- ifelse(player_per_game_df$Position!="D", predict(step_f_glm, player_per_game_df[,c("CF", "Goals", "Total.Assists", "Shots", "Total.Penalties", "Minor", "Takeaways", "Hits.Taken", "CF_pp", "CA_pp", "SCA_pp", "HDCF_pp", "LDCA_pp", "Goals_pp", "Total.Assists_pp", "Shots_pp", "iSCF_pp", "iHDCF_pp", "SCA_pk", "HDCF_pk", "HDCA_pk", "Total.Assists_pk", "iCF_pk", "iSCF_pk", "Rebounds.Created_pk", "Penalties.Drawn_pk")], type = "response"), predict(step_d_glm, player_per_game_df[,c("Goals", "Shots", "iSCF", "Hits.Taken", "Shots.Blocked", "CF_pp", "SCF_pp", "LDCF_pp", "Shots_pp", "Rebounds.Created_pp", "PIM_pp", "Total.Penalties_pp", "Minor_pp", "CA_pk", "SCA_pk", "LDCA_pk", "Total.Assists_pk", "PIM_pk", "Total.Penalties_pk", "Minor_pk", "Giveaways_pk")], type = "response"))
player_f_det_df <- player_per_game_df %>% filter(Position != "D") %>% select(Player, Team, year, CF, Goals, Total.Assists, Shots, Total.Penalties, Minor, Takeaways, Hits.Taken, CF_pp, CA_pp, SCA_pp, HDCF_pp, LDCA_pp, Goals_pp, Total.Assists_pp, Shots_pp, iSCF_pp, iHDCF_pp, SCA_pk, HDCF_pk, HDCA_pk, Total.Assists_pk, iCF_pk, iSCF_pk, Rebounds.Created_pk, Penalties.Drawn_pk, CAP.HIT.., EstCapHit)
player_d_det_df <- player_per_game_df %>% filter(Position == "D") %>% select(Player, Team, year, Goals, Shots, iSCF, Hits.Taken, Shots.Blocked, CF_pp, SCF_pp, LDCF_pp, Shots_pp, Rebounds.Created_pp, PIM_pp, Total.Penalties_pp, Minor_pp, CA_pk, SCA_pk, LDCA_pk, Total.Assists_pk, PIM_pk, Total.Penalties_pk, Minor_pk, Giveaways_pk, EstCapHit)
player_f_df <- player_per_game_df %>% filter(Position != "D") %>% filter(year < 2018) %>% filter(GP > 60) %>% select(Player, Team, year, GP, CAP.HIT.., EstCapHit) %>% rename(Cap_Hit = CAP.HIT.., Estimated_Cap_Hit = EstCapHit)
player_d_df <- player_per_game_df %>% filter(Position == "D") %>% filter(year < 2018) %>% filter(GP > 60) %>% select(Player, Team, year, GP, CAP.HIT.., EstCapHit) %>% rename(Cap_Hit = CAP.HIT.., Estimated_Cap_Hit = EstCapHit)
head(player_f_df %>% arrange(-Estimated_Cap_Hit), n = 20) %>% kable("html") %>% kable_styling()
| Player | Team | year | GP | Cap_Hit | Estimated_Cap_Hit |
|---|---|---|---|---|---|
| Claude_Giroux | PHI | 2016 | 78 | 8275000 | 9536499 |
| Jason_Spezza | DAL | 2015 | 82 | 7000000 | 9058724 |
| Alex_Ovechkin | WSH | 2017 | 82 | 9538462 | 8692597 |
| Tyler_Seguin | DAL | 2015 | 71 | 5750000 | 8291754 |
| Ryan_O_Reilly | BUF | 2017 | 72 | 7500000 | 8187712 |
| Jason_Spezza | DAL | 2016 | 75 | 7500000 | 8016420 |
| Tyler_Seguin | DAL | 2017 | 82 | 5750000 | 7953395 |
| Phil_Kessel | TOR | 2015 | 82 | 8000000 | 7935799 |
| Claude_Giroux | PHI | 2017 | 82 | 8275000 | 7836780 |
| Alex_Ovechkin | WSH | 2015 | 81 | 9538462 | 7798531 |
| Phil_Kessel | PIT | 2016 | 82 | 8000000 | 7572535 |
| Joe_Thornton | S.J | 2016 | 82 | 6750000 | 7527376 |
| Jason_Spezza | DAL | 2017 | 68 | 7500000 | 7518205 |
| Evgeni_Malkin | PIT | 2017 | 62 | 9500000 | 7453605 |
| Mark_Stone | OTT | 2016 | 75 | 3500000 | 7372516 |
| Bobby_Ryan | OTT | 2015 | 78 | 5100000 | 7342898 |
| Jamie_Benn | DAL | 2016 | 82 | 5250000 | 7328143 |
| Anze_Kopitar | L.A | 2016 | 81 | 6800000 | 7151951 |
| Mike_Hoffman | OTT | 2017 | 74 | 5187500 | 7081869 |
| Tyler_Seguin | DAL | 2016 | 72 | 5750000 | 7075758 |
head(player_d_df %>% arrange(-Estimated_Cap_Hit), n = 20) %>% kable("html") %>% kable_styling()
| Player | Team | year | GP | Cap_Hit | Estimated_Cap_Hit |
|---|---|---|---|---|---|
| P_K__Subban | MTL | 2016 | 68 | 9000000 | 10954811 |
| Keith_Yandle | FLA | 2017 | 82 | 6350000 | 9352123 |
| Dion_Phaneuf | TOR | 2015 | 70 | 7000000 | 8852881 |
| Shea_Weber | MTL | 2017 | 78 | 7857143 | 8819301 |
| Dougie_Hamilton | BOS | 2015 | 72 | 925000 | 7688128 |
| John_Carlson | WSH | 2017 | 72 | 3966667 | 7613749 |
| Kris_Letang | PIT | 2016 | 71 | 7250000 | 7353238 |
| Alexander_Edler | VAN | 2015 | 74 | 5000000 | 7305675 |
| Drew_Doughty | L.A | 2017 | 82 | 7000000 | 7257779 |
| Shea_Weber | NSH | 2015 | 78 | 7857143 | 7082954 |
| Aaron_Ekblad | FLA | 2017 | 68 | 925000 | 7072747 |
| Oliver_Ekman_Larsson | ARI | 2016 | 75 | 5500000 | 7044832 |
| Jake_Muzzin | L.A | 2017 | 82 | 4000000 | 6993346 |
| Brent_Burns | S.J | 2016 | 82 | 5760000 | 6947536 |
| Shayne_Gostisbehere | PHI | 2016 | 64 | 925000 | 6771965 |
| Kris_Letang | PIT | 2015 | 69 | 7250000 | 6715196 |
| Drew_Doughty | L.A | 2016 | 82 | 7000000 | 6577072 |
| P_K__Subban | MTL | 2015 | 82 | 9000000 | 6553050 |
| Ryan_Suter | MIN | 2016 | 82 | 7538462 | 6472063 |
| Torey_Krug | BOS | 2016 | 81 | 5250000 | 6439972 |
player_2018_f_df <- player_per_game_df %>% filter(Position != "D") %>% filter(year == 2018) %>% filter(GP > 5) %>% select(Player, Team, year, CAP.HIT.., EstCapHit) %>% rename(Cap_Hit = CAP.HIT.., Estimated_Cap_Hit = EstCapHit)
head(player_2018_f_df %>% arrange(-Estimated_Cap_Hit), n = 20) %>% kable("html") %>% kable_styling()
| Player | Team | year | Cap_Hit | Estimated_Cap_Hit |
|---|---|---|---|---|
| Jakub_Voracek | PHI | 2018 | 8250000 | 17932527 |
| Jonathan_Toews | CHI | 2018 | 10500000 | 15682475 |
| Evgeni_Malkin | PIT | 2018 | 9500000 | 11020279 |
| Evgeny_Kuznetsov | WSH | 2018 | 7800000 | 9733922 |
| Jonathan_Drouin | MTL | 2018 | 5500000 | 9508431 |
| Joe_Thornton | S.J | 2018 | 8000000 | 9469489 |
| Mike_Hoffman | OTT | 2018 | 5187500 | 9319404 |
| David_Backes | BOS | 2018 | 6000000 | 9225687 |
| Patrick_Kane | CHI | 2018 | 10500000 | 9144583 |
| Johnny_Gaudreau | CGY | 2018 | 6750000 | 8738026 |
| Patrice_Bergeron | BOS | 2018 | 6875000 | 8357984 |
| Vladimir_Tarasenko | STL | 2018 | 7500000 | 8332791 |
| Blake_Wheeler | WPG | 2018 | 5600000 | 8156201 |
| Steven_Stamkos | T.B | 2018 | 8500000 | 8151298 |
| Max_Pacioretty | MTL | 2018 | 4500000 | 8024670 |
| Taylor_Hall | N.J | 2018 | 6000000 | 7982195 |
| Bobby_Ryan | OTT | 2018 | 7250000 | 7870804 |
| Thomas_Vanek | VAN | 2018 | 2000000 | 7842372 |
| Barclay_Goodrow | S.J | 2018 | 650000 | 7837137 |
| Alex_Galchenyuk | MTL | 2018 | 4900000 | 7772333 |
player_2018_d_df <- player_per_game_df %>% filter(Position == "D") %>% filter(year == 2018) %>% filter(GP > 5) %>% select(Player, Team, year, CAP.HIT.., EstCapHit) %>% rename(Cap_Hit = CAP.HIT.., Estimated_Cap_Hit = EstCapHit)
head(player_2018_d_df %>% arrange(-Estimated_Cap_Hit), n = 20) %>% kable("html") %>% kable_styling()
| Player | Team | year | Cap_Hit | Estimated_Cap_Hit |
|---|---|---|---|---|
| Dustin_Byfuglien | WPG | 2018 | 7600000 | 18654542 |
| Roman_Josi | NSH | 2018 | 4000000 | 12868584 |
| P_K__Subban | NSH | 2018 | 9000000 | 12836368 |
| Brad_Hunt | VGK | 2018 | 650000 | 12238751 |
| Duncan_Keith | CHI | 2018 | 5538462 | 9839100 |
| Sami_Vatanen | ANA | 2018 | 4875000 | 9756144 |
| Matt_Tennyson | BUF | 2018 | 650000 | 9441197 |
| Keith_Yandle | FLA | 2018 | 6350000 | 9389085 |
| Kris_Letang | PIT | 2018 | 7250000 | 9269856 |
| Shea_Weber | MTL | 2018 | 7857143 | 8908570 |
| Adam_Pelech | NYI | 2018 | 1600000 | 8905920 |
| Ben_Hutton | VAN | 2018 | 2800000 | 8678523 |
| Rasmus_Ristolainen | BUF | 2018 | 5400000 | 8518337 |
| Karl_Alzner | MTL | 2018 | 4625000 | 8307787 |
| Erik_Karlsson | OTT | 2018 | 6500000 | 8246111 |
| Drew_Doughty | L.A | 2018 | 7000000 | 7940454 |
| John_Carlson | WSH | 2018 | 3966667 | 7925082 |
| Alec_Martinez | L.A | 2018 | 4000000 | 7835307 |
| Colin_Miller | VGK | 2018 | 1000000 | 7808783 |
| Alex_Pietrangelo | STL | 2018 | 6500000 | 7207172 |
player_per_game_df$EstCapHit <- sapply(player_per_game_df$EstCapHit, function(x){
x <- ifelse(x > 15000000, 15000000, x)
})
player_per_game_df$EstCapHit <- sapply(player_per_game_df$EstCapHit, function(x){
x <- ifelse(x < 650000, 650000, x)
})
nst_team_stats_df <- rbind(nst_team_2015_df, nst_team_2016_df, nst_team_2017_df, nst_team_2018_df)
nst_team_stats_df$points <- nst_team_stats_df$W * 2 + nst_team_stats_df$OTL
nst_team_stats_df$teamabbr <- sapply(nst_team_stats_df$Team, function(x) {
x <- ifelse(grepl("anaheim", tolower(x)), "ANA",
ifelse(grepl("arizona", tolower(x)), "ARI",
ifelse(grepl("boston", tolower(x)), "BOS",
ifelse(grepl("buffalo", tolower(x)), "BUF",
ifelse(grepl("calgary", tolower(x)), "CGY",
ifelse(grepl("carolina", tolower(x)), "CAR",
ifelse(grepl("chicago", tolower(x)), "CHI",
ifelse(grepl("colorado", tolower(x)), "COL",
ifelse(grepl("columbus", tolower(x)), "CBJ",
ifelse(grepl("dallas", tolower(x)), "DAL",
ifelse(grepl("detroit", tolower(x)), "DET",
ifelse(grepl("edmonton", tolower(x)), "EDM",
ifelse(grepl("florida", tolower(x)), "FLA",
ifelse(grepl("los angeles", tolower(x)), "L.A",
ifelse(grepl("minnesota", tolower(x)), "MIN",
ifelse(grepl("montreal", tolower(x)), "MTL",
ifelse(grepl("nashville", tolower(x)), "NSH",
ifelse(grepl("new jersey", tolower(x)), "N.J",
ifelse(grepl("islanders", tolower(x)), "NYI",
ifelse(grepl("rangers", tolower(x)), "NYR",
ifelse(grepl("ottawa", tolower(x)), "OTT",
ifelse(grepl("philadelphia", tolower(x)), "PHI",
ifelse(grepl("pittsburgh", tolower(x)), "PIT",
ifelse(grepl("san jose", tolower(x)), "S.J",
ifelse(grepl("st louis", tolower(x)), "STL",
ifelse(grepl("tampa bay", tolower(x)), "T.B",
ifelse(grepl("toronto", tolower(x)), "TOR",
ifelse(grepl("los angeles", tolower(x)), "L.A",
ifelse(grepl("vancouver", tolower(x)), "VAN",
ifelse(grepl("vegas", tolower(x)), "VGK",
ifelse(grepl("washington", tolower(x)), "WSH",
ifelse(grepl("winnipeg", tolower(x)), "WPG",
"other"))))))))))))))))))))))))))))))))
})
player_per_game_df$CapxTOI <- player_per_game_df$CAP.HIT.. * player_per_game_df$TOI
player_per_game_df$EstCapxTOI <- player_per_game_df$EstCapHit * player_per_game_df$TOI
team_summary_df <- player_per_game_df %>% filter(GP>10) %>% filter(year<2018) %>% mutate(countcolumn = 1) %>% group_by(Team, year) %>% summarize(weighted = sum(CapxTOI), estweighted = sum(EstCapxTOI), captotal = sum(CAP.HIT..), time = sum(TOI), players = sum(countcolumn)) %>% mutate(weighted_cap = weighted / time) %>% mutate(est_weighted_cap = estweighted / time) %>% mutate(cap = captotal / sum(players))
keep <- c("teamabbr", "year","points", "PDO", "SV.", "W", "CF.", "FF.", "SF.", "LDSV.", "MDSV.", "HDSV.", "GF", "GP", "HDCF")
team_summary_df <- merge(team_summary_df, nst_team_stats_df[,names(nst_team_stats_df) %in% keep], by.x = c("Team", "year"), by.y = c("teamabbr", "year")) %>% mutate(year = factor(year))
ggplot(team_summary_df, aes(x=est_weighted_cap, y=points, color=year)) + geom_point() + theme_economist() + labs(x = "Estimated Average Player Cap Hit (millions)", y = "Points", title = "Estimated Average Player Cap Hit Relation to Points") + theme(plot.title = element_text(hjust = 0.5), text = element_text(size=12), legend.position = "right") + scale_x_continuous(limits = c(3000000, 5500000), breaks = (seq(3000000, 5500000, 500000)), labels = formattermillion) + scale_y_continuous(limits = c(40, 130), breaks = (seq(40, 120, 10))) + scale_color_manual(values = c("mediumblue", "steelblue", "springgreen"))
I have added in the varying types of save percentage (shots from low-danger, mid-danger, and high-danger areas) since this study has not included the impact of goaltending.
team_lm <- lm(data = team_summary_df, points ~ est_weighted_cap + LDSV. + MDSV. + HDSV.)
team_step_lm <- step(team_lm, trace = FALSE)
summary(team_step_lm)
##
## Call:
## lm(formula = points ~ est_weighted_cap + LDSV. + MDSV. + HDSV.,
## data = team_summary_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -33.815 -8.980 2.360 8.143 26.283
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.047e+03 4.374e+02 -4.681 1.07e-05 ***
## est_weighted_cap 1.020e-05 4.095e-06 2.491 0.01470 *
## LDSV. 1.251e+01 4.417e+00 2.832 0.00578 **
## MDSV. 6.393e+00 2.094e+00 3.053 0.00303 **
## HDSV. 2.855e+00 9.299e-01 3.070 0.00287 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.15 on 85 degrees of freedom
## Multiple R-squared: 0.333, Adjusted R-squared: 0.3016
## F-statistic: 10.61 on 4 and 85 DF, p-value: 5.089e-07
team_prediction_df <- player_per_game_df %>% filter(GP>0) %>% filter(year==2018) %>% mutate(countcolumn = 1) %>% group_by(Team, year) %>% summarize(weighted = sum(CapxTOI), estweighted = sum(EstCapxTOI), captotal = sum(CAP.HIT..), time = sum(TOI), players = sum(countcolumn)) %>% mutate(weighted_cap = weighted / time) %>% mutate(est_weighted_cap = estweighted / time) %>% mutate(cap = captotal / sum(players))
keep <- c("teamabbr", "year","points", "PDO", "SV.", "W", "CF.", "FF.", "SF.", "LDSV.", "MDSV.", "HDSV.", "GP", "GF", "HDCF")
team_prediction_df <- merge(team_prediction_df, nst_team_stats_df[,names(nst_team_stats_df) %in% keep], by.x = c("Team", "year"), by.y = c("teamabbr", "year"))
team_prediction_df$PredictedPoints <- predict(team_step_lm, team_prediction_df[,c("est_weighted_cap", "LDSV.", "MDSV.", "HDSV.")])
team_prediction_df <- team_prediction_df %>% mutate(positionrank = rank(-points, ties.method = "average"), estpositionrank = rank(-PredictedPoints, ties.method = "average")) %>% mutate(year = factor(year))
ggplot(team_prediction_df, aes(x=est_weighted_cap, y=PredictedPoints, color=year)) + geom_point() + theme_economist() + labs(x = "Estimated Average Player Cap Hit (millions)", y = "Estimated Points", title = "Estimated 2018 Results") + theme(plot.title = element_text(hjust = 0.5), text = element_text(size=12), legend.position = "none") + scale_x_continuous(limits = c(3000000, 6500000), breaks = (seq(3000000, 6500000, 500000)), labels = formattermillion) + scale_y_continuous(limits = c(60, 130), breaks = (seq(60, 130, 10))) + scale_color_manual(values = c("mediumblue"))
colnames(team_prediction_df) <- c("Team", "year", "weighted", "estweighted", "captotal", "time", "players", "weighted_cap", "Estimated_Cap", "cap", "GP","Wins", "CF_Perc", "FF_Perc", "SF_Perc", "GF", "HDCF","HDSV_Perc", "MDSV_Perc", "LDSV_Perc", "SV_Perc", "PDO", "Points", "Predicted_Points", "Rank", "Est_Rank")
head(team_prediction_df %>% arrange(-Predicted_Points), n = 32) %>% select(Team, Estimated_Cap, LDSV_Perc, MDSV_Perc, HDSV_Perc, PDO, Points, Rank, Est_Rank, Predicted_Points) %>% kable("html") %>% kable_styling()
| Team | Estimated_Cap | LDSV_Perc | MDSV_Perc | HDSV_Perc | PDO | Points | Rank | Est_Rank | Predicted_Points |
|---|---|---|---|---|---|---|---|---|---|
| WSH | 4579948 | 99.28 | 96.65 | 87.86 | 1.020 | 35 | 10.0 | 1 | 109.77320 |
| CHI | 5688806 | 98.53 | 95.80 | 88.71 | 1.004 | 29 | 22.0 | 2 | 108.69379 |
| PHI | 4141515 | 99.09 | 96.61 | 89.96 | 1.017 | 29 | 22.0 | 3 | 108.66507 |
| ANA | 3968388 | 99.52 | 96.87 | 87.55 | 1.013 | 30 | 19.0 | 4 | 107.05955 |
| L.A | 4044817 | 98.14 | 98.06 | 90.23 | 1.019 | 41 | 2.0 | 5 | 105.83647 |
| S.J | 4585989 | 99.36 | 96.33 | 86.70 | 0.979 | 32 | 15.5 | 6 | 105.47810 |
| NSH | 5262799 | 98.67 | 95.65 | 88.39 | 1.005 | 39 | 4.0 | 7 | 104.22788 |
| T.B | 4447205 | 98.94 | 96.11 | 88.68 | 1.029 | 42 | 1.0 | 8 | 103.05582 |
| STL | 4737125 | 98.67 | 95.99 | 88.32 | 1.013 | 40 | 3.0 | 9 | 100.84051 |
| WPG | 5372630 | 98.91 | 94.99 | 87.14 | 1.010 | 38 | 5.0 | 10 | 100.56211 |
| VAN | 4493144 | 97.21 | 97.79 | 90.60 | 1.009 | 32 | 15.5 | 11 | 98.10664 |
| MTL | 5273781 | 98.86 | 95.82 | 84.42 | 0.989 | 30 | 19.0 | 12 | 96.46904 |
| CAR | 4508526 | 98.82 | 94.62 | 89.79 | 0.991 | 28 | 24.0 | 13 | 95.82431 |
| CBJ | 3915007 | 98.74 | 96.73 | 87.50 | 1.003 | 35 | 10.0 | 14 | 95.72130 |
| CGY | 4993286 | 97.49 | 97.27 | 87.88 | 0.990 | 32 | 15.5 | 15 | 95.61992 |
| NYR | 4228625 | 98.83 | 95.43 | 88.68 | 1.007 | 32 | 15.5 | 16 | 95.10387 |
| N.J | 4099493 | 98.53 | 96.39 | 86.76 | 1.018 | 36 | 7.0 | 17 | 90.68997 |
| VGK | 4695134 | 98.06 | 97.19 | 84.47 | 1.004 | 35 | 10.0 | 18 | 89.46206 |
| BOS | 4063974 | 99.35 | 93.77 | 88.68 | 0.999 | 30 | 19.0 | 19 | 89.31715 |
| NYI | 4185823 | 97.92 | 96.37 | 87.50 | 1.017 | 35 | 10.0 | 20 | 85.92537 |
| BUF | 4947457 | 98.37 | 95.81 | 83.41 | 0.971 | 18 | 31.0 | 21 | 84.06481 |
| DET | 4164180 | 98.61 | 95.44 | 85.60 | 0.989 | 27 | 25.0 | 22 | 82.96564 |
| COL | 4662615 | 97.71 | 97.23 | 83.64 | 0.989 | 26 | 26.5 | 23 | 82.63877 |
| PIT | 4611159 | 98.34 | 94.03 | 87.55 | 0.961 | 35 | 10.0 | 24 | 80.70054 |
| OTT | 4483309 | 97.74 | 95.98 | 85.24 | 0.985 | 25 | 28.0 | 25 | 77.76257 |
| TOR | 3240157 | 97.73 | 95.79 | 89.58 | 1.025 | 37 | 6.0 | 26 | 76.13523 |
| FLA | 4263779 | 98.25 | 94.85 | 85.59 | 0.993 | 26 | 26.5 | 27 | 75.67839 |
| ARI | 3643413 | 98.00 | 95.87 | 86.41 | 0.978 | 19 | 30.0 | 28 | 75.08618 |
| DAL | 3651222 | 97.95 | 93.72 | 90.55 | 0.995 | 33 | 13.0 | 29 | 72.61582 |
| EDM | 3492833 | 98.19 | 93.89 | 89.43 | 0.989 | 24 | 29.0 | 30 | 71.89163 |
| MIN | 3662911 | 97.30 | 95.25 | 89.50 | 0.996 | 29 | 22.0 | 31 | 71.38784 |
head(team_prediction_df %>% arrange(-Estimated_Cap), n = 32) %>% select(Team, Estimated_Cap, LDSV_Perc, MDSV_Perc, HDSV_Perc, PDO, Points, Rank, Est_Rank, Predicted_Points) %>% kable("html") %>% kable_styling()
| Team | Estimated_Cap | LDSV_Perc | MDSV_Perc | HDSV_Perc | PDO | Points | Rank | Est_Rank | Predicted_Points |
|---|---|---|---|---|---|---|---|---|---|
| CHI | 5688806 | 98.53 | 95.80 | 88.71 | 1.004 | 29 | 22.0 | 2 | 108.69379 |
| WPG | 5372630 | 98.91 | 94.99 | 87.14 | 1.010 | 38 | 5.0 | 10 | 100.56211 |
| MTL | 5273781 | 98.86 | 95.82 | 84.42 | 0.989 | 30 | 19.0 | 12 | 96.46904 |
| NSH | 5262799 | 98.67 | 95.65 | 88.39 | 1.005 | 39 | 4.0 | 7 | 104.22788 |
| CGY | 4993286 | 97.49 | 97.27 | 87.88 | 0.990 | 32 | 15.5 | 15 | 95.61992 |
| BUF | 4947457 | 98.37 | 95.81 | 83.41 | 0.971 | 18 | 31.0 | 21 | 84.06481 |
| STL | 4737125 | 98.67 | 95.99 | 88.32 | 1.013 | 40 | 3.0 | 9 | 100.84051 |
| VGK | 4695134 | 98.06 | 97.19 | 84.47 | 1.004 | 35 | 10.0 | 18 | 89.46206 |
| COL | 4662615 | 97.71 | 97.23 | 83.64 | 0.989 | 26 | 26.5 | 23 | 82.63877 |
| PIT | 4611159 | 98.34 | 94.03 | 87.55 | 0.961 | 35 | 10.0 | 24 | 80.70054 |
| S.J | 4585989 | 99.36 | 96.33 | 86.70 | 0.979 | 32 | 15.5 | 6 | 105.47810 |
| WSH | 4579948 | 99.28 | 96.65 | 87.86 | 1.020 | 35 | 10.0 | 1 | 109.77320 |
| CAR | 4508526 | 98.82 | 94.62 | 89.79 | 0.991 | 28 | 24.0 | 13 | 95.82431 |
| VAN | 4493144 | 97.21 | 97.79 | 90.60 | 1.009 | 32 | 15.5 | 11 | 98.10664 |
| OTT | 4483309 | 97.74 | 95.98 | 85.24 | 0.985 | 25 | 28.0 | 25 | 77.76257 |
| T.B | 4447205 | 98.94 | 96.11 | 88.68 | 1.029 | 42 | 1.0 | 8 | 103.05582 |
| FLA | 4263779 | 98.25 | 94.85 | 85.59 | 0.993 | 26 | 26.5 | 27 | 75.67839 |
| NYR | 4228625 | 98.83 | 95.43 | 88.68 | 1.007 | 32 | 15.5 | 16 | 95.10387 |
| NYI | 4185823 | 97.92 | 96.37 | 87.50 | 1.017 | 35 | 10.0 | 20 | 85.92537 |
| DET | 4164180 | 98.61 | 95.44 | 85.60 | 0.989 | 27 | 25.0 | 22 | 82.96564 |
| PHI | 4141515 | 99.09 | 96.61 | 89.96 | 1.017 | 29 | 22.0 | 3 | 108.66507 |
| N.J | 4099493 | 98.53 | 96.39 | 86.76 | 1.018 | 36 | 7.0 | 17 | 90.68997 |
| BOS | 4063974 | 99.35 | 93.77 | 88.68 | 0.999 | 30 | 19.0 | 19 | 89.31715 |
| L.A | 4044817 | 98.14 | 98.06 | 90.23 | 1.019 | 41 | 2.0 | 5 | 105.83647 |
| ANA | 3968388 | 99.52 | 96.87 | 87.55 | 1.013 | 30 | 19.0 | 4 | 107.05955 |
| CBJ | 3915007 | 98.74 | 96.73 | 87.50 | 1.003 | 35 | 10.0 | 14 | 95.72130 |
| MIN | 3662911 | 97.30 | 95.25 | 89.50 | 0.996 | 29 | 22.0 | 31 | 71.38784 |
| DAL | 3651222 | 97.95 | 93.72 | 90.55 | 0.995 | 33 | 13.0 | 29 | 72.61582 |
| ARI | 3643413 | 98.00 | 95.87 | 86.41 | 0.978 | 19 | 30.0 | 28 | 75.08618 |
| EDM | 3492833 | 98.19 | 93.89 | 89.43 | 0.989 | 24 | 29.0 | 30 | 71.89163 |
| TOR | 3240157 | 97.73 | 95.79 | 89.58 | 1.025 | 37 | 6.0 | 26 | 76.13523 |
Goaltending value was not calculated.
This analysis does not take into account scheduling, which can greatly impact the possible number of points for each team.
Relative values for advanced statistics (e.g. Corsi, Fenwick) should be pulled from other data sources and analyzed.
Given recent rule changes, it appears the average number of goals scored per game has now increased from previous years.
pre_2018_scoring_df <- team_summary_df %>% group_by(year) %>% summarize(GPT = sum(GP), GFT = sum(GF)) %>% mutate(GPG = GFT / GPT)
cur_2018_scoring_df <- team_prediction_df %>% mutate(year = as.numeric(as.character(year))) %>% summarize(year = max(year), GPT = sum(GP), GFT = sum(GF)) %>% mutate(GPG = GFT / GPT)
rbind(cur_2018_scoring_df, pre_2018_scoring_df) %>% arrange(year) %>% rename(games_played = GPT, goals_for = GFT, goals_per_game_per_team = GPG) %>% kable("html") %>% kable_styling()
| year | games_played | goals_for | goals_per_game_per_team |
|---|---|---|---|
| 2015 | 2460 | 4404 | 1.790244 |
| 2016 | 2460 | 4248 | 1.726829 |
| 2017 | 2460 | 4475 | 1.819106 |
| 2018 | 876 | 1651 | 1.884703 |