2016-2017 NHL Player Statistics Analysis

Loading Packages and Data Below

# I paste some code in here, maybe to identify all of the libraries I need to use and then to read in the data and to report some details about the data. 
setwd("//Users/markgranatire//Documents//DS736//NHL Player Stats & Salary")
file1 <- "nhlplayerstats.csv"

# Dataset I found is only a glimpse of a good portion of player stats and salaries from 2016 to 2017 NHL Season
library(data.table)
library(DescTools)
library(ggplot2)
library(lubridate)
library(ggrepel)
library(dplyr)
library(scales)
library(ggthemes)
library(RColorBrewer)
library(tidyr)
library(leaflet)
library(stringr)
nhl_df <- fread(file1)

Introduction to the Data

I’ve compiled a dataset focusing on player statistics and salaries for the 2016-2017 NHL season, sourced from https://www.kaggle.com/datasets/camnugent/predict-nhl-player-salaries/data. Looking at test.csv and test_salaries.csv, I combined both of them due to test_salaries.csv lining up with test.csv data. I named it nhlplayerstats.csv and read that file in. I also created a TeamLocations.csv that houses the Team, Division, Conference, Latitude, and Longitude. According to quanthockey.com, there were around 888 players that dressed up for an NHL game during that season. With only 612 rows and 143 columns, the dataset encompasses detailed information on the majority of the NHL players that season, including salary, position, team, birth date, hometown, draft history, and basic performance statistics such as games played (GP), goals (G), assists (A), points (PTS), plus/minus (+/-), penalty minutes (PIM), and time of ice per game (TOI/GP). One challenge I encountered was handling players who had played for multiple teams during the season. To address this, I modified the data to ensure that each player was associated with only one of the 30 NHL teams. Additionally, I found that some players were undrafted, so I incorporated the value of ‘undrafted’ in the draft history information for these cases. This process allowed me to streamline the dataset and focus on essential player attributes while ensuring accurate analysis. Overall, this dataset provides a comprehensive resource for exploring player performance, salary distributions, geographic trends, and draft dynamics, aligning with my lifelong passion for hockey.

Visualizations

First Visualization - Top 10 Point Leaders

# I paste some code in here for my first tab
# Now time to get rid of columns that don't mean much to the average hockey fan
ColsToDrop <- c("PDO", "F/60", "A/60", "Pct%", "Diff", "Diff/60", "iCF", "iCF", "iFF", "iSF", "iSF", "iSF", "ixG", "iSCF", "iRB", "iRS", "iDS", "sDist", "sDist", 
                "Pass", "iHF", "iHF", "iHA", "iHDf", "iMiss", "iGVA", "iTKA", "iBLK", "iGVA", "iTKA", "iBLK", "BLK%", "iFOW", "iFOL", "iFOW", "iFOL", 
                "FO%", "%FOT", "dzFOW", "dzFOL", "nzFOW", "nzFOL", "ozFOW", "ozFOL", "FOW.Up", "FOL.Up", "FOW.Down", "FOL.Down", "FOW.Close", "FOL.Close", "OTG", "1G", "GWG", "ENG", "PSG", "PSA", 
                "G.Bkhd", "G.Dflct", "G.Slap", "G.Snap", "G.Tip", "G.Wrap", "G.Wrst", "CBar", "Post", "Over", "Wide", "S.Bkhd", "S.Dflct", "S.Slap", "S.Snap", "S.Tip", "S.Wrap", "S.Wrst", "iPenT", "iPenD", 
                "iPENT", "iPEND", "iPenDf", "NPD", "Min", "Maj", "Match", "Misc", "Game", "CF", "CA", "FF", "FA", "SF", "SA", "xGF", "xGA", "SCF", "SCA", "GF", 
                "GA", "RBF", "RBA", "RSF", "RSA", "DSF", "DSA", "FOW", "FOL", "HF", "HA", "GVA", "TKA", "PENT", "PEND", "OPS", "DPS", "PS", "OTOI", "Grit", 
                "DAP", "Pace", "GS", "GS/G")

# Function to drop the columns and relook at the data
nhl_df <- nhl_df[, setdiff(colnames(nhl_df), ColsToDrop), with = FALSE]

# Decided to drop addtional columns that are not needed
AdditionalColsToDrop <- c("Nat", "A1", "A2", "E+/-", "Shifts", "TOI", "TOIX", "TOI%", "IPP%", "SH%", "SV%")
nhl_df <- nhl_df[, setdiff(colnames(nhl_df), AdditionalColsToDrop), with = FALSE]

# NA rows were undrafted players so I wanted to fill those rows with Undrafted
other_df <- nhl_df %>%
  filter(rowSums(is.na(nhl_df)) > 0)  %>%
  select(DftYr, DftRd, Ovrl) %>%
  mutate(DftYr = "Undrafted", DftRd = "Undrafted", Ovrl = "Undrafted") %>%
  data.frame()

# Set the na rows in the nhl_df to the na rows from the other_df
nhl_df[is.na(nhl_df)] <- other_df[is.na(nhl_df)]

# I set the draft columns in nhl_df to the values of Undrafted in other_df
nhl_df <- nhl_df %>%
  mutate(
    DftYr = ifelse(is.na(DftYr), other_df$DftYr, DftYr),
    DftRd = ifelse(is.na(DftRd), other_df$DftRd, DftRd),
    Ovrl = ifelse(is.na(Ovrl), other_df$Ovrl, Ovrl)
    ) %>%
  as.data.frame() # this allowed me to keep some of the column names that ended up changing with data.frame

# First visualization will be top 10 players in pts and their salary
# wanted to sort the player points
stats_sorted <- nhl_df[order(nhl_df$PTS, decreasing = TRUE), ]

# wanted to just get the top 10 players of nhl stats
top10 <- head(stats_sorted, 10)
row.names(top10) <- NULL

# changed the salary to numeric
top10$Salary <- as.numeric(top10$Salary)

# Wanted to reorder the top10 data frame with Last Name then PTS
top10 <- top10 %>%
  mutate(`Last Name` = reorder(`Last Name`, PTS))

# Set salary data frame to the top10 data frame
salary_df <- top10 %>% select(`Last Name`, Salary)

# Wanted to created labs for a secondary axis and make it readable for the viewer
second_ylab <- seq(0, max(top10$Salary)/1e6, 2)
#first_ylab <- seq(0, max(top10$PTS), by = 10)
my_labels <- paste0("$", second_ylab, "M")

# Setting the ggplot to get the Top 10 players with PTS and their salary as a secondary axis
ggplot(top10, aes(x = `Last Name`, y = PTS, fill = "PTS")) +
  geom_bar(stat = "identity", position = position_stack(reverse = TRUE)) + 
  coord_flip() + 
  theme_light() + 
  labs(title = "Top 10 Point Leaders and Salary", x = "Player Name", y = "Number of Points", fill = "Legend") + 
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_fill_brewer(palette = "Blues", guide = guide_legend(reverse = TRUE)) +
  geom_text(data = top10, aes(label = scales::comma(PTS), x = `Last Name`, y = PTS, fill = NULL), hjust = -0.1, size = 5, color = "black") +
  geom_line(inherit.aes = FALSE, data = salary_df, 
              aes(x = `Last Name`, y = Salary/75000, colour = "Salary", group = 1), linewidth = 1) +
  scale_color_manual(NULL, values = "black") +
  scale_y_continuous(labels = comma,
                      sec.axis = sec_axis(~. *75000, name = "Salary", labels = my_labels,
                                          breaks = second_ylab*1e6)) +
  geom_point(inherit.aes = FALSE, data = salary_df,
              aes(x = `Last Name`, y = Salary/75000, group = 1),
              size = 3, shape = 21, fill = "yellow", color = "black") +
  theme(legend.background = element_rect(fill = "transparent"),
         legend.box.background = element_rect(fill = "transparent", colour = "black"),
         legend.spacing = unit(-1, "lines"))

Insight: This dual-axis graph presents the top 10 NHL point leaders alongside their respective salaries. Player names are listed on the x-axis, while the first y-axis represents the number of points, and the second y-axis represents player salaries in millions. Initially, I reoriented the bar chart to position the x-axis on the left side, allowing for the utilization of two y-axis. The horizontal bar chart represents the number of points, while player salaries are depicted using a line plot. Upon examination of the data, Kane and Crosby emerge as the top performers with 89 points each. Their high salaries reflect their significant contributions to their teams. Further analysis reveals Draisaitl’s impressive performance relative to his salary, which is under $1 million despite amassing 77 points. Conversely, Tarasenko falls short of expectations with only 75 points, despite being one of the league’s highest-paid players at around $8 million. This visualization underscores the importance of key players within their teams and highlights those who exceed or fall short of performance expectations. As NHL teams strategize for roster improvements during the offseason, leveraging this data can inform decisions regarding player acquisition in free agency and determine their potential value.

Second Visualization - In-Depth Analysis of Top 10 Point Leaders

#data frame that gets all the player stats from the top10 data frame, used pivot longer from tidyr to spilt those stats into 2 columns
top10_playerstats <- top10 %>%
  select(`Last Name`, GP, G, A, PTS, `+/-`, PIM, `TOI/GP`) %>%
  pivot_longer(cols = c(GP, G, A, PTS, `+/-`, PIM, `TOI/GP`), names_to = "Statistic", values_to = "Value") %>%
  mutate(Statistic = factor(Statistic, levels = c("GP", "G", "A", "PTS", "+/-", "PIM", "TOI/GP"))) %>%
  as.data.frame()

#set last name to be factor
top10_playerstats$`Last Name` <- factor(top10_playerstats$`Last Name`, levels = (unique(top10_playerstats$`Last Name`)))

#trellis chart to show all key stats for the top 10 players
ggplot(top10_playerstats, aes(x = Statistic, y = Value, fill = Statistic)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = comma) +
  labs(title = "Multiple Bar Charts - Top 10 Players Statistics",
       x = "NHL Players",
       y = "Total Points",
       fill = "Statistics") +
  scale_fill_brewer(palette = "Set2") +
  geom_text(data = top10_playerstats, aes(label = scales::comma(Value), x = Statistic, y = Value, fill = NULL), vjust = -0.5, size = 3, color = "black") +
  facet_wrap(~`Last Name`, ncol=5, nrow = 2, scales = "free_x")

Insight: Continuing analyzing the top 10 NHL point leaders of the 2016-2017 season, this visualization employs a trellis chart format. Offering a detailed perspective on these top players aids both fans and management in assessing player effectiveness throughout the season. Each player is represented with key metrics including games played (GP), goals (G), assists (A), total points (PTS), plus/minus (+/-), penalty minutes (PIM), and time on ice per game (TOI/GP). Examining the data, it’s evident that these players participated in the majority of their team’s games and tended to have more assists than goals. Notably, Brad Marchand stands out with 81 penalty minutes, a considerably higher figure compared to other top players. While Marchand demonstrates goal-scoring prowess, his significant time spent in the penalty box may adversely affect his team’s performance. Additionally, Vladimir Tarasenko’s -1 plus/minus rating raises eyebrows. A positive plus/minus indicates being on the ice for more goals scored by one’s team, whereas a negative suggests being on the ice for more goals against. Tarasenko’s negative rating suggests he was frequently on the ice for opposing team goals, warranting further analysis of his defensive contributions.

Third Visualization - Top 5 Point Leaders In Popular Birth Years

#created data frame to get change the players DOB to just be birth year
years_df <- nhl_df %>%
  select(Born) %>%
  mutate(year = year(ymd(Born))) %>%
  group_by(year) %>%
  summarise(n = length(Born), .groups = 'keep') %>%
  data.frame()

#gets the top 8 birth years with the most players in each
years_df <- years_df[order(years_df$n, decreasing = TRUE),]
years_df <- head(years_df, 8)
years_df$year <- factor(years_df$year)
nhl_df$Born <- factor(nhl_df$Born)

#gets the top players based off points
stats_sorted <- nhl_df[order(nhl_df$PTS, decreasing = TRUE), ]
row.names(stats_sorted) <- NULL

#creates data frame that gets the top players in each birth year
playerlist_in_top5_birthyear <- nhl_df %>%
  filter(year(ymd(Born)) %in% years_df$year) %>%
  select(`First Name`, `Last Name`, PTS, Born) %>%
  mutate(birth_year = year(ymd(Born))) %>%
  group_by(birth_year) %>%
  arrange(birth_year, desc(PTS)) %>%
  slice_head(n = 5) %>%
  ungroup() %>%
  data.frame()

#sets the birth year to be a factor
playerlist_in_top5_birthyear$birth_year <- factor(playerlist_in_top5_birthyear$birth_year)

# creating a trellis chart for top 5 players in each birth year
ggplot(playerlist_in_top5_birthyear, aes(x = reorder(Last.Name, -PTS), y = PTS, fill = birth_year)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = comma) +
  geom_text(data = playerlist_in_top5_birthyear, aes(label = scales::comma(PTS), x = Last.Name, y = PTS, fill = NULL), vjust = -0.5, size = 5, color = "black") +
  labs(title = "Multiple Bar Charts - Top 5 Players in Top 8 Populated Birth Years",
       x = "NHL Players",
       y = "Total Points",
       fill = "Year") +
  scale_fill_brewer(palette = "Set2") +
  facet_wrap(~birth_year, ncol=4, nrow = 2, scales = "free_x")

Insight: This visualization is known as a trellis chart which takes a broader look at top point leaders in the NHL during the 2016-2017 season. This specific trellis chart looks at the top 5 point leaders in the top 8 most populated birth years. The reason why I wanted to look at this data was to get a bigger picture of how the total point leaders broke down between individual birth years. I found the birth years where most players were born and took the top 8 birth years. I utilized the last name column of the player on the x-axis and the amount of points on the y-axis. It seems like birth years 1987, 1991, 1993, and 1995 all have players that separated themselves from the majority. In the years 1989 and 1992, it seems like each of the players scored around the same amount of points. This visualization can lend insight into what birth year is becoming more dominant around the NHL. It can be concluded that the older players who are more experienced and can get more points while the younger birth years are still learning how to be successful in the NHL. Fans and NHL front offices can gather insight on specific players within a birth year to see how they can compare to players their own age. The visualization could come in handy to determine if that player is becoming a success in the NHL.

Fourth Visualization - Total Salary Per Country by Position

# I paste some code in here for my second tab
# Reset the salary to numeric
nhl_df$Salary <- as.numeric(as.character(nhl_df$Salary))

# ifelse statement to make other positions to be named Forward
nhl_df <- nhl_df %>%
  mutate(Position = ifelse(!Position %in% c("C", "LW", "RW", "D"), "Other Forward", Position))

# wanted to get a list of total amount of salaries per country
total_salaries_by_country <- nhl_df %>%
  group_by(`Cntry`) %>%
  summarise(TotalSalary = sum(Salary))

# wanted to get the top5 salaries per country
top5_salaries_by_country <- nhl_df %>%
  select(Cntry, Position, Salary) %>%
  group_by(Cntry) %>%
  summarise(TotalSalary = sum(Salary)) %>%
  arrange(desc(TotalSalary)) %>%
  head(5)

# made a variable of the top5 countries by salaries
selected_top5_countries <- top5_salaries_by_country$Cntry

# wanted to get the top 5 countries salaries per position
total_salaries_by_country_by_position <- nhl_df %>%
  filter(Cntry %in% selected_top5_countries) %>%
  select(Cntry, Position, Salary) %>%
  group_by(Cntry, Position) %>%
  summarise(TotalSalary = sum(Salary))

# set x axis labels for the graph
xlab <- seq(0, max(top5_salaries_by_country$TotalSalary)/1e6, by = 100)
my_labels <- paste0("$", xlab, "M")

# used ggplot to make stacked bar chart by countries as the salary is x-axis and each fill is a different position
ggplot(total_salaries_by_country_by_position, aes(x = TotalSalary/1e6, y = reorder(Cntry, -TotalSalary), fill = Position)) +
  geom_bar(stat = "identity", position = position_stack(reverse = FALSE)) + 
  coord_flip() + 
  theme_light() + 
  labs(title = "Total Salary Per Country By Position", x = "Salary", y = "Country", fill = "Position") + 
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_fill_brewer(palette = "Accent", guide = guide_legend(reverse = FALSE)) +
  geom_text(data = top5_salaries_by_country, aes(label = sprintf("$%.2fM", TotalSalary/1e6), x = TotalSalary/1e6, y = Cntry, fill = NULL), vjust = -0.5, size = 3, color = "black") +
  scale_x_continuous(labels = my_labels, breaks = xlab, name = "Salary")

Insight: This visualization is a stack bar chart that looks at the total salary per country by position. To get this data, I retrieved the top 5 countries with the highest salaries given to the players in the league. For the position field, I look at C (Center), LW (Left Wing), RW (Right Wing), and D (Defenseman). Based on the original data, multiple players played multiple different positions. I modified the data for players that played multiple positions which resulted in another Position value called ‘Other Forward’. Once I retrieved that data, I grouped it by position to show how the total salary per country broke down. Based on the data, Canadian-born players lead the way in a big way. They are almost 2x bigger than the USA. Furthermore, forward positions such as Centers, Left Wings, Right Wings, and other Forwards make up the majority of the total amount. The reason for this is due to having highly skilled forwards as well as more forwards than defensemen in the NHL. On the international front, the results are not out of the ordinary because, in international tournaments such as the Olympics and the World Championships, these top 5 countries are always usually at the top of the standings. Another piece of information that would be helpful would be the number of players per country to get insight into how they compare to each other.

Fifth Visualization - Player Count by Team

#data frame that gets the number of players per team
players_per_teams <- nhl_df %>%
  mutate(Team = str_replace(Team, ".*\\/", "")) %>%
  select(Team) %>%
  group_by(Team) %>%
  summarise(n = length(Team)) %>%
  data.frame()

#set variables of high and low numbers for ggplot
hi_lo <- players_per_teams %>%
  filter(n == min(n) | n == max(n)) %>%
  data.frame()

#plots the number of players to each team
ggplot(players_per_teams, aes(x = Team, y = n, group = 1)) + 
  geom_line(color = 'black', size = 1) + 
  geom_point(shape = 21, size = 4, color = 'purple', fill = 'white') +
  labs(x = "Hour", y = "Number of Players", title = "Player Count by Team") +
  scale_y_continuous(labels = comma, breaks = seq(min(players_per_teams$n), max(players_per_teams$n), by = 2)) +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5)) +
  geom_point(data = hi_lo, aes(x = Team, y = n), shape = 21, size = 4, fill = 'purple', color = 'white') +
  geom_label_repel(aes(label = ifelse(n == max(n) | n == min(n), scales::comma(n), "")), box.padding = 2, 
                   point.padding = 2, size = 4, color = 'grey50', segment.color = 'darkblue')

Insight: This visualization is a line graph that shows the total number of players per NHL team during the 2016-2017 NHL season. An interesting fact to note is that the NHL data only shows 613 players who played a game in the NHL that season. After some research, 888 players played an NHL game that season. This is important to note because an NHL team can at least dress 20 players during an NHL game. Looking at the data, the lowest number of players used were by Chicago Blackhawks and the Edmonton Oilers with a total of 15 players. Based on the fact that an NHL team can’t dress less than 20 players, players are missing from Edmonton and Chicago. Nashville Predators led the way with 28 players used during the season. With over 100 players missing, this data isn’t the best to rely on, but it can paint a reasonable picture of how each team utilized players. Teams that use lots of players could be teams that dealt with massive injury problems throughout the season, or they could be teams that are rebuilding and want to give younger players a chance to play in an NHL game.

Sixth Visualization - Percentage Per Draft Round

#creating data frame for nhl players drafted and the count
draft_rd_count <- count(nhl_df, DftRd)
draft_rd_count <- draft_rd_count[order(-draft_rd_count$n), ]

#sets the total to numeric and calculates the percentage
draft_rd_count$n <- as.numeric(draft_rd_count$n)
draft_rd_count$percent <- draft_rd_count[draft_rd_count$DftRd %in% c("1", "2", "3", "4", "5", "6", "7", "8", "9", "Undrafted"), "n"] / sum(draft_rd_count$n)
draft_rd_count$percent <- round(draft_rd_count$percent * 100, 2)

#creates the data frame that will be used in the ggplot
draft_rd_df <- draft_rd_count %>%
  select(DftRd, n, percent) %>%
  group_by(DftRd) %>%
  ungroup() %>%
  data.frame()

#creates the pie chart for player count per draft round
ggplot(data = draft_rd_df, aes(x = "", y = n, fill = DftRd)) +
  geom_bar(stat = "identity", position = "fill") + 
  coord_polar(theta = "y", start = 0) +
  labs(fill = "Draft Round", x = NULL, y = NULL, title = "Percentage Per Draft Round",
       caption = "Slices under 5% are not labeled") +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5),
        axis.text = element_blank(), 
        axis.ticks = element_blank(), 
        panel.grid = element_blank()) + 
  scale_fill_brewer(palette = "Paired") +
  geom_text(aes(x=1.7, label = ifelse(percent>5, paste0(percent, "%"), "")),
            size = 4, position = position_fill(vjust = 0.5))

Insight: This visualization is a pie chart that looks at the total amount of players in the dataset and what round where each player was drafted in. There were lots of N/As within the data which represented that those players were Undrafted players. I had to modify the data to change the N/As rows to Undrafted. Looking at this data, most players in the NHL are first-round draft picks which shows that they have performed as expected. Players in the 2nd round make up around 17% of the players in the NHL. Undrafted players make up around 16% of the players in the NHL. This could be potential because certain players take a while to peak at their best. There is then a big drop off in numbers as 3rd round drafts come in 4th place with around 9% of the players in the NHL. I didn’t include labels for data under 5% as it would have overlapped with other labels. With this data, the NHL front offices can use it to evaluate how much a draft pick is worth at the NHL Entry Draft. Each team is known for having a list that values how much a draft pick is worth to them. Based on the data, NHL teams should be more inclined to potentially trade higher-round draft picks as the chances of that player making the NHL are slim. This dataset could also be used as motivation by players that might not get drafted as it shows there is still hope in cracking an NHL roster.

Seventh Visualization - Number of Players Per Team by Round Drafted

#creates data frame that will used for ggplot
team_draft_rd_count <- nhl_df %>%
  mutate(Team = str_replace(Team, ".*\\/", "")) %>%
  select(Team, DftRd) %>%
  group_by(Team, DftRd) %>%
  summarise(n = length(DftRd)) %>%
  data.frame()

#creates breaks with total number of players
breaks <- c(seq(0, max(team_draft_rd_count$n), by = 2))

#heatmap to displayer team player count per draft round
ggplot(team_draft_rd_count, aes(x = Team, y = DftRd, fill=n)) +
  geom_tile(color='black') +
  geom_text(aes(label=comma(n))) +
  coord_equal(ratio = 1) +
  labs(title = "Heatmap: Number of Players per Team by Round Drafted In", 
       x = "Teams", 
       y = "Round Drafted", 
       fill = "Number of Players") +
  theme_dark() + 
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_discrete(labels = c("1" = "1st round",
                              "2" = "2nd round", 
                              "3" = "3rd round", 
                              "4" = "4th round",
                              "5" = "5th round", 
                              "6" = "6th round",
                              "7" = "7th round",
                              "8" = "8th round", 
                              "9" = "9th round")) +
  scale_fill_continuous(low = "white", high = "red", breaks = breaks) +
  guides(fill = guide_legend(reverse = TRUE, override.aes=list(colour = "black")))

Insight: This heatmap visualizes the number of players per team based on their draft round. I used the draft round for the y-axis and NHL teams for the x-axis. According to the data, most teams’ rosters consist primarily of first-round, second-round, and undrafted players. An interesting observation is the scarcity of teams drafting players in the 8th or 9th rounds, likely due to the eligibility of pre-2005 draftees. The NHL’s reduction in draft rounds began in the 2005 season. The Minnesota Wild led with 12 first-round draft picks, while the New York Islanders and Winnipeg Jets had ten each. However, having numerous first-round picks doesn’t guarantee success, as both the Islanders and Jets missed the playoffs that season. This heatmap emphasizes the challenges of NHL play, even for undrafted players.

Eighth Visualization - Map of NHL Teams

#reading in another csv to the data project
file2 <- "TeamLocations.csv"
team_data_df <- fread(file2)

nhl_df <- nhl_df %>%
  mutate(Team = str_replace(Team, ".*\\/", "")) 

team_mapping_df <- data.frame(
  Abbreviated_Team = c("ANA", "ARI", "BOS", "BUF", "CGY", "CAR", "CHI", "COL", "CBJ", "DAL", "DET", "EDM", 
                       "FLA", "L.A", "MIN", "MTL", "NSH", "N.J", "NYI", "NYR", "OTT", "PHI", "PIT", 
                       "S.J", "STL", "T.B", "TOR", "VAN", "WSH", "WPG"),
  Full_Team = c("Anaheim Ducks", "Arizona Coyotes", "Boston Bruins", "Buffalo Sabres", "Calgary Flames", "Carolina Hurricanes", "Chicago Blackhawks", "Colorado Avalanche", "Columbus Blue Jackets", "Dallas Stars", "Detroit Red Wings", "Edmonton Oilers", 
                "Florida Panthers", "Los Angeles Kings", "Minnesota Wild", "Montreal Canadiens", "Nashville Predators", 
                "New Jersey Devils", "New York Islanders", "New York Rangers", "Ottawa Senators", "Philadelphia Flyers", 
                "Pittsburgh Penguins", "San Jose Sharks", "St. Louis Blues", "Tampa Bay Lightning", "Toronto Maple Leafs", "Vancouver Canucks", 
                "Washington Capitals", "Winnipeg Jets"))

# joing just full team names to nhl_df
nhl_df_full_teams <- left_join(nhl_df, team_mapping_df, by = c("Team" = "Abbreviated_Team"))

#Setting up colnames without quotes for team_data_df
colnames(team_data_df) <- c("Team", "Conference", "Division", "Latitude", "Longitude")

#removed whitespace as well as quotes in the csv file
team_data_df$Team <- trimws(team_data_df$Team, which = "both")
team_data_df$Longitude <- trimws(team_data_df$Longitude, which = "both")

team_data_df$Team <- substring(team_data_df$Team, 2,)
team_data_df$Longitude <- as.numeric(substring(team_data_df$Longitude, 1, nchar(team_data_df$Longitude)-1))

# Now, left join nhl_df_full_teams with team_data_df based on full team names
nhl_df_combined_data <- left_join(nhl_df_full_teams, team_data_df, by = c("Full_Team" = "Team"), relationship = "many-to-many")

# code to draw map of NHL Teams
nhl_map <- leaflet() %>%
  addProviderTiles(providers$Esri.WorldTopoMap) %>%
  addCircles(
    data = subset(nhl_df_combined_data, Conference == 'Western' & Division == 'Pacific'),
    lng = ~Longitude,
    lat = ~Latitude,
    opacity = 10, 
    color = 'black', 
    popup = ~Team,
    radius = 100
  ) %>%
  addCircles(
    data = subset(nhl_df_combined_data, Conference == 'Western' & Division == 'Central'),
    lng = ~Longitude,
    lat = ~Latitude,
    opacity = 10, 
    color = 'orange', 
    popup = ~Team,
    radius = 100
  ) %>%
  addCircles(
    data = subset(nhl_df_combined_data, Conference == 'Eastern' & Division == 'Atlantic'),
    lng = ~Longitude,
    lat = ~Latitude,
    opacity = 10, 
    color = 'red', 
    popup = ~Team,
    radius = 100
  ) %>%
  addCircles(
    data = subset(nhl_df_combined_data, Conference == 'Eastern' & Division == 'Metropolitan'),
    lng = ~Longitude,
    lat = ~Latitude,
    opacity = 10, 
    color = 'darkgreen', 
    popup = ~Team,
    radius = 100
  )
nhl_map

Insight: This visualization is a map that pins each NHL team to their home arena. In order to get this data, I utilized TeamLocations.csv which I read in as the second file. I eventually joined a modified version of nhl_df to that data to help pin-point the NHL teams. During the 2016-2017 season, the NHL had 30 NHL teams. There were 23 US teams and 7 Canadian teams. The NHL has added two more teams since that season, but understanding where teams are located could help fans get more interested in their hometown team. The majority of the teams are in the east due to the population numbers in both countries (USA & Canada) along the east coast. Utilizing a leaflet library, I was able to capitalize on the opportunity to show the user where each team is located. I separated teams by what division they were in. Each division has a different color representing the icon the Atlantic division is red, the Metropolitan Division is green, the Pacific division is black, and the Central division is orange.

Wrap up

In conclusion, these visualizations provide a comprehensive analysis of the NHL, covering player performance, team dynamics, draft trends, and geographical distribution. While each visualization offers valuable insights into specific aspects of the league, there are opportunities to improve clarity, context, and presentation to maximize their effectiveness in conveying information to stakeholders. By refining visualization techniques and providing clearer explanations of data trends, these visualizations can serve as powerful tools for fans, management, and NHL front offices alike. They can inform strategic decision-making processes and facilitate a deeper understanding of the intricacies of the world of professional hockey.